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COMPUTING  THE  OBSERVED  INFORMATION  MATRIX 
FOR  DYNAMIC  MIXTURE  MODELS 


1.  INTRODUCTION 

Mixture  distributions  are  widely  used  in  statistical  analysis  to  model  data  originating 
from  a  number  of  distinct  sources.  In  these  models,  each  source  is  represented  by  a  component 
of  the  mixture.  The  distance  between  the  sources  in  the  sample  space  determines  both  the 
level  of  difficulty  of  estimating  the  mixture  parameters,  and  the  amount  of  uncertainty  in  the 
parameter  estimates.  Intuitively,  the  uncertainty  associated  with  assigning  measurements  to 
sources  should  increase  as  the  distance  between  the  sources  in  the  sample  space  decreases. 
This  report  is  concerned  with  accurately  assessing  the  estimation  error  in  mixture  estimation 
problems  for  which  measurement-to-source  assignment  uncertainty  is  a  major  contributor  to 
uncertainty  in  the  data. 

Mixture  estimation  can  be  treated  as  a  missing  data  problem  where  the  missing  data  are 
the  measurement-to-source  assignments.  Consequently,  the  expectation-maximization  (EM) 
method  of  Dempster  et  al.  [1]  provides  a  convenient  iterative  approach  for  finding  maximum 
likelihood  estimates  of  the  mixture  parameters.  Indeed,  mixture  estimation  is  one  of  the 
numerous  applications  of  the  EM  method  discussed  in  their  paper.  The  EM  method  is  most 
useful  in  situations  where  the  corresponding  complete  data  (i.e.,  observed  data  -i-  missing 
data)  problem  has  a  straightforward  solution.  Gaussian  mixture  estimation  is  one  example. 
In  this  case,  application  of  the  EM  method  yields  a  sequence  of  weighted  linear  least-squares 
estimates  for  each  of  the  Gaussian  mean  vectors  that  converge  to  their  maximum  likelihood 
estimates.  Gaussian  mixture  models  have  been  studied  extensively  by  many  authors  (see,  for 
example,  the  monograph  by  McEachlan  and  Basford  [2]  and  the  references  therein)  and  are 
the  basis  for  the  more  complex  mixture  models  considered  here. 

The  main  criticisms  of  the  EM  method  are  that  it  does  not  give  an  immediate  expression 
for  the  error-covariance  matrix  for  the  estimated  parameters,  and  that  it  converges  slowly  near 
the  solution  (in  contrast  to  gradient-based  techniques,  which  at  least  approximate  the  error- 
covariance  matrix  for  the  parameter  estimates,  and  often  converge  rapidly  near  the  solution). 
Eouis  [3]  addresses  both  criticisms  in  his  paper  on  finding  the  observed  information  matrix 
when  using  the  EM  method  for  incomplete  data  problems.  In  his  paper,  Eouis  shows  that 
the  observed  information  matrix,  defined  as  minus  the  second  derivative  of  the  observed  (or 
incomplete)  data  log-likelihood  function,  evaluated  at  the  maximum  likelihood  estimate,  is 
obtained  by  straightforward  manipulations  of  the  complete  data  log-likelihood  function.  The 
inverse  of  this  matrix  can  then  be  used  as  an  estimate  of  the  error-covariance  matrix  for  the 
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estimated  parameters,  as  suggested  by  Efron  and  Hinkley  [4].  Furthermore,  Louis  shows  that 
the  observed  information  matrix  ean  be  used  to  aeeelerate  eonvergenee  of  the  EM  iterations. 

In  this  report  Louis’s  results  are  applied  to  an  important  elass  of  mixture  models  termed 
“dynamie  mixture  models.”  A  dynamie  (or  time-varying)  mixture  model  eonstitutes  a  se¬ 
quence  of  standard  mixture  distributions  related  in  time  by  a  proeess  model.  Dynamie  mixture 
distributions  are  used  to  model  data  eolleeted  over  time  originating  from  a  number  of  distinet 
moving  sourees.  (Here,  souree  motion  refers  to  a  ehange  over  time  of  any  oharaeteristie  of 
the  souree — for  example,  loeation,  orientation,  and  intensity.)  If  the  sourees  are  stationary, 
one  ean  pool  data  eolleeted  over  multiple  sampling  times  and  use  a  standard  (statie)  mixture 
to  deseribe  the  sample  distribution.  However,  if  the  sourees  are  non-stationary,  one  must  ae- 
eount  for  souree  motion  in  the  mixture  to  aeeurately  model  the  distribution  of  the  sample  at 
eaeh  sampling  time.  A  dynamie  mixture  model  may  be  deterministie  or  stoehastie.  In  the 
former  ease,  a  parametrie  motion  model  is  used  to  deseribe  the  trajeetory  of  eaeh  souree  in 
the  mixture.  In  the  latter  ease,  the  trajeetory  of  eaeh  souree  is  treated  as  a  sequenee  of  random 
variables  whose  mean  evolves  aeeording  to  a  deterministie  motion  model.  In  either  ease,  the 
objeetive  of  this  report  is  to  eompute  the  observed  information  matrix  for  the  estimated  mix¬ 
ture  parameters,  and  to  assess  the  quality  of  this  matrix  (or,  more  preeisely,  the  inverse  of  this 
matrix)  as  a  eharaeterization  of  estimation  error. 

1.1  RELATED  WORK 

Work  related  speeifieally  to  eomputing  the  observed  information  matrix  for  dynamie 
mixture  models — and,  more  generally,  to  assessing  the  impaet  of  measurement-to-souree  as¬ 
signment  uneertainty  on  estimation  error — is  found  in  both  the  statisties  and  engineering  liter¬ 
ature.  One  paper  from  the  statisties  literature  is  partieularly  relevant.  In  [5],  Meng  and  Rubin 
propose  the  supplemented  EM  (SEM)  algorithm  as  an  alternative  approaeh  for  eomputing 
the  error-eovarianee  matrix  for  parameter  estimates  obtained  using  the  EM  method.  Their 
approaeh,  in  eontrast  to  Louis’s  analytieal  approaeh,  is  half  analytieal  and  half  numerieal.  In 
short,  the  SEM  algorithm  requires  analytieal  differentiation  to  obtain  the  information  matrix 
assoeiated  with  the  eomplete  data,  but  uses  numerieal  differentiation  to  eompute  the  informa¬ 
tion  matrix  assoeiated  with  the  missing  data.  The  differenee  between  these  two  matriees  is  the 
observed  information  matrix,  whieh  is  then  inverted  to  obtain  the  error-eovarianee  matrix.  For 
problems  where  the  algebraie  analysis  required  by  Louis’s  proeedure  is  tedious  or  intraetable, 
the  SEM  algorithm  is  an  attraetive  alternative. 

Two  other  papers  from  the  statisties  literature  are  worth  mentioning.  Green  [6]  dis- 
eusses  the  EM  method  in  the  eontext  of  maximum  penalized  likelihood  estimation  (mathe- 
matieally  equivalent  to  maximum  a  posteriori  estimation),  a  problem  for  whieh  he  laments 
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the  EM  method  has  seen  little  use.  In  his  paper,  Green  makes  a  simple  modification  to  the  EM 
algorithm  for  maximum  penalized  likelihood  estimation  to  obtain  the  one-step-late  (OSE)  al¬ 
gorithm,  which  is  often  easier  to  compute  and  converges  at  least  as  quickly.  Green’s  paper 
is  relevant  to  this  discussion  because  the  estimation  problem  for  stochastic  dynamic  mixture 
models  is  a  maximum  a  posteriori  estimation  problem  for  which  the  OSE  algorithm  may 
be  potentially  useful.  In  a  later  paper,  Segal  et  al.  [7]  combine  the  results  of  Meng,  Rubin, 
and  Green  to  compute  error- variances  via  the  SEM  algorithm  for  maximum  penalized  like¬ 
lihood  estimates  obtained  using  the  OSE  algorithm.  Their  approach  is  directly  applicable  to 
computing  the  error-covariance  matrix  for  stochastic  dynamic  mixture  models.  However,  this 
report  will  show  that  the  algebraic  analysis  required  by  Eouis’s  approach  yields  insightful  ex¬ 
pressions  for  the  particular  stochastic  dynamic  mixture  model  considered  here — namely,  the 
linear  Gauss-Markov  model. 

In  the  engineering  literature,  and  the  target  tracking  literature  in  particular,  related  work 
falls  into  three  overlapping  categories:  mixture  models  for  multiple  target  tracking,  informa¬ 
tion  reduction  factors  for  single  target  tracking  in  clutter,  and  minimum  variance  (Cramer- 
Rao)  bound  calculations  for  tracking  performance  prediction.  The  report  by  Streit  and  Eugin- 
buhl  [8]  and  the  paper  by  Gauvrit  et  al.  [9]  are  the  primary  references  for  the  mixture  model 
approach  to  multiple  target  tracking  considered  in  this  report.  In  this  approach,  each  target 
is  represented  by  a  component  (or  possibly  a  collection  of  components)  in  a  mixture  model 
for  the  measurement  distribution.  By  the  very  nature  of  this  model,  it  is  assumed  that  every 
measurement  originates  from  all  the  targets;  more  precisely,  each  measurement  is  assigned  to 
every  target  with  a  certain  probability.  This  unorthodox  tracking  model  is  a  contradistinction 
to  the  widely  accepted  multiple  hypothesis  tracking  (MHT)  model  proposed  by  Reid  [10],  in 
which  each  measurement  is  assigned  to  one  and  only  one  target,  or  to  clutter  (background 
noise).  While  the  MHT  assignment  model  is  perhaps  more  realistic,  it  leads  to  a  set  of  track 
hypotheses  that  grows  exponentially  with  the  number  of  measurements.  Consequently,  MHT 
algorithms  require  sophisticated  heuristics  to  manage  hypothesis  enumeration,  which  typi¬ 
cally  involves  pruning  and  merging  branches  on  the  hypothesis  tree.  Alternatively,  the  mix¬ 
ture  model  algorithms  have  complexity  that  is  roughly  linear  in  the  numbers  of  measurements 
and  targets;  thus,  hypothesis  management  is  not  required  for  these  algorithms.  However,  as 
succinctly  put  by  Streit  in  [11],  the  price  to  be  paid  for  the  “heresy”  of  violating  the  one- 
measurement-per-target  rule  of  multiple  target  tracking  is  a  likelihood  function  that  may  be 
“riddled”  with  local  maxima.  Streit  goes  on  in  [11]  to  blend  a  mixture  model  with  “limited 
enumeration”  to  address  this  problem. 

The  stochastic  dynamic  mixture  model  discussed  in  this  report  is  precisely  the  mixture 
model  used  by  Streit  and  Euginbuhl  [8]  and  Gauvrit  et  al.  [9]  for  multiple  target  tracking. 
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Streit  and  Luginbuhl  call  the  approach  “probabilistic  multi-hypothesis  tracking  (PMHT).” 
Of  these  works,  only  the  former  considers  computation  of  the  error-covariance  matrices  for 
the  track  estimates.  However,  the  analysis  of  Streit  and  Luginbuhl  is  incomplete  in  that  the 
matrices  identified  in  their  report  as  the  error-covariance  matrices  for  PMHT  do  not  account 
for  the  information  lost  to  the  missing  data.  Hence,  what  have  become  widely  accepted  as  the 
error-covariance  matrices  for  PMHT  are  incorrect  and,  worse,  as  will  be  shown  in  this  report, 
are  overly-optimistic.  This  report  gives  a  precise  statistical  interpretation  of  these  matrices 
and  derives  expressions  for  the  correct  error-covariance  matrices  for  PMHT  that  account  for 
the  information  lost  to  the  missing  data. 

Two  additional  approaches  related  to  PMHT  must  also  be  acknowledged.  Avitzour’s 
work  [12],  a  remarkably  similar  but  independent  antecedent  to  PMHT,  is  perhaps  the  first  ap¬ 
plication  of  missing  data  and  EM  to  multiple  target  tracking.  The  similarities  and  differences 
between  the  two  approaches  are  discussed  in  [8];  notably,  PMHT  substitutes  a  stochastic 
(Markovian)  motion  model  for  the  deterministic  (polynomial)  motion  model  of  Avitzour’s 
approach.  Also,  Avitzour  does  not  discuss  computation  of  error-covariance  matrices  for  track 
estimates.  The  multiple  target  tracking  approach  proposed  by  Molnar  and  Modestino  [13] 
also  uses  missing  data  and  EM,  although  their  measurement-to-target  assignment  model  is 
markedly  different  than  that  of  Avitzour’s  or  PMHT’s.  Nevertheless,  Molnar  and  Modestino 
propose  an  approximation  to  the  error-covariance  matrix  for  the  target  state  estimates  that  ex¬ 
plicitly  accounts  for  measurement-to-target  assignment  uncertainty.  In  their  approximation, 
each  measurement’s  contribution  to  the  total  information  content  of  all  the  measurements  with 
respect  to  each  target  is  scaled  by  the  measurement-to-target  assignment  probability.  Neither 
the  development  nor  the  quality  of  this  approximation  is  discussed  in  [13].* 

The  notion  that  measurement  assignment  uncertainty  in  tracking  should  increase  the 
variance  of  the  track  estimates  is  not  new.  Eor  example,  in  [14]  Eortmann  et  al.  analyze 
the  effect  of  clutter  on  the  update  of  the  target  state  covariance  matrix.  Specifically,  they 
consider  a  deterministic  approximation  to  the  stochastic  matrix  Riccati  equation  associated 
with  the  probabilistic  data  association  (PDA)  filter  of  Bar-Shalom  and  Tse  [15].  (Recall  that 
the  matrix  Riccati  equation  of  Kalman  filtering  theory  is  a  recursion  for  the  update  of  the 
state  covariance  matrix.)  This  approximation,  which  replaces  the  random  (data-dependent) 
quantities  in  the  stochastic  matrix  Riccati  equation  with  their  expected  values,  leads  to  a 
modified  matrix  Riccati  equation  that  looks  like  the  standard  equation,  with  the  addition  of 
a  scalar  factor  in  front  of  the  Kalman  gain  term.  This  scalar  factor,  called  the  information 
reduction  factor  and  denoted  q2  in  their  paper,  takes  values  between  0  and  1 ;  these  extremes 
correspond  to  total  assignment  uncertainty  and  no  assignment  uncertainty,  respectively.  A 

*Note  that  the  critical  term  in  this  approximation  (the  terni  in  brackets  in  [13,  equation  (48)])  is  missing  an  inverse. 
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value  of  g2  =  0  eliminates  the  Kalman  gain  term,  so  that  the  updated  state  covariance  matrix 
equals  the  predicted  state  covariance  matrix;  a  value  of  g2  =  1  reduces  the  modified  matrix 
Riccati  equation  to  the  standard  equation,  so  that  the  updated  state  covariance  matrix  is  equal 
to  the  minimum  state  covariance  matrix.  The  information  reduction  factor  q2  is  a  function 
of  the  probability  of  detection  Pd  and  the  probability  of  false  alarm  Pfa,  with  g2  =  1  when 
Fd  =  1  and  PpA  =  0,  and  0  <  g2  <  1  when  Pd  <  1  and  PpA  >  0.  In  short,  the  information 
reduction  factor  accounts  for  measurement  assignment  uncertainty  due  to  missed  detections 
and  clutter.  Computation  of  q2  is  nontrivial,  and  is  described  by  Gelfand  et  al.  in  [16]. 

The  most  commonly  used  baseline  forjudging  target  tracking  performance  is  the  Cramer- 
Rao  lower  bound  (CRLB),  that  is,  the  minimum  variance  bound  on  estimation  error.  Recall 
that  the  CRLB  is  defined  in  terms  of  an  average  (expectation)  over  all  possible  values  of 
the  observed  data.  Hence,  for  any  given  tracking  model,  the  CRLB  can  be  used  to  predict 
tracking  performance  in  the  absence  of  measurements.  The  multiple  target  tracking  problem 
complicates  computation  of  the  CRLB,  since  the  measurement-to-target  assignments  are  al¬ 
most  never  observed.  The  approach  then  is  to  marginalize  over  the  assignment  hypotheses  and 
compute  the  minimum  variance  bound  based  on  the  marginal  distribution  of  measurements 
and  target  states.  This  is  the  approach  presented  by  Daum  in  [17].  As  described  by  Daum, 
computation  of  this  marginal  is  impractical,  as  the  number  of  association  hypotheses  is  enor¬ 
mous  even  for  small  problems.  To  address  this  problem,  Daum  provides  a  family  of  lower 
bounds  on  the  minimum  variance  bound,  where  each  member  of  the  family  corresponds  to  a 
collection  of  association  hypotheses  that  include  the  correct  hypothesis.  The  lower  bound  cor¬ 
responding  to  the  set  of  all  possible  hypotheses  corresponds  to  the  minimum  variance  bound, 
whereas  the  lower  bound  corresponding  to  the  subset  that  contains  only  the  correct  hypothesis 
corresponds  to  the  trivial  lower  bound,  which  ignores  measurement  assignment  uncertainty 
entirely.  Thus,  tighter  bounds  can  be  achieved  at  the  expense  of  computational  complexity  by 
considering  progressively  larger  sets  of  association  hypotheses. 

Further  approximations  to  the  CRLB  when  there  is  measurement  assignment  uncer¬ 
tainty  have  been  computed  by  Jauffret  and  Bar-Shalom  [18]  and  Kirubarajan  and  Bar-Shalom 
[19]  for  tracking  a  single  target  in  clutter.  Both  works  use  the  approach  of  Fortmann  et  al.  [14] 
to  show  that  the  Fisher  information  matrix  (FIM)  for  this  problem  is  a  scalar  multiple  of  the 
FIM  for  the  problem  without  clutter,  where  the  scalar  multiple  is  the  information  reduction 
factor  q2  developed  in  [14].  It  follows  that  the  CRLB  (inverse  of  the  FIM)  increases  rapidly 
with  the  amount  of  assignment  uncertainty  and,  in  fact,  grows  without  bound  as  q2  — >  0. 
Subsequent  papers  by  Willett  and  Bar-Shalom  [20]  and  Niu  et  al.  [21]  extend  these  results  by 
finding  a  set  of  sufficient  conditions  for  the  class  of  models  for  single  target  tracking  in  clutter 
whose  CRLB  takes  this  form. 
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Another  set  of  referenees  relevant  to  this  report  deal  with  CRLB  eomputations  for  the 
mixture  model  approaeh  to  multiple  target  traeking.  In  [22],  Perlovsky  develops  explieit 
CRLB  expressions  for  the  parameters  of  a  normal  (Gaussian)  mixture  model  of  the  measure¬ 
ment  distribution  for  traeking  multiple  targets  in  elutter.  This  work  is  based  on  his  earlier 
paper  [23]  on  eomputing  the  CRLB  for  normal  mixtures.  While  the  CRLB  expressions  de¬ 
veloped  in  these  papers  are  indeed  explieit,  they  are  written  in  terms  of  quantities  (“elass 
overlap”  terms)  that  ultimately  require  numerieal  evaluation  of  multi-dimensional  integrals 
(expeetations  over  all  values  of  the  observed  data),  where  there  are  as  many  integrals  as  there 
are  observations,  and  eaeh  integral  has  dimension  equal  to  that  of  a  data  point.  Le  Cadre 
et  al.  [24]  address  the  same  problem  in  their  paper  on  eomputing  the  CRLB  for  the  multi¬ 
ple  target  version  of  a  elassie  subjeet  in  the  traeking  literature  known  as  bearings-only  target 
motion  analysis.  As  in  Perlovsky’s  papers,  the  elass  overlap  or  “souree  interaetion”  terms  in- 
dueed  in  the  CRLB  expressions  by  the  mixture  model  for  measurement-to-target  assignment 
lead  to  integrals  that  eannot  be  evaluated  in  elosed  form.  Among  the  eontributions  in  [24] 
are  analytieal  approximations  to  these  integrals  based  on  series  expansions.  In  [25],  Hue  et 
al.  derive  reeursive  formulas  for  eomputing  the  “posterior”  CRLB  for  multiple  target  traek¬ 
ing  assuming  stoehastie  (Markovian)  target  motion  and  three  different  measurement-to-target 
assignment  models — namely,  the  known  assignment  model,  the  one-measurement-per-target 
model,  and  the  PMHT  model.  The  posterior  CRLB  for  the  PMHT  model  is  elosely  related 
to  the  posterior  observed  information  matrix  derived  in  this  report;  speoifieally,  the  posterior 
CRLB  is  the  inverse  of  the  expeeted  value  of  the  posterior  information  matrix  over  all  values 
of  the  observed  data  and  all  values  of  the  target  states.  These  expeetations  are  evaluated  in 
Hue  et  al.  [25]  using  Monte-Carlo  integration  teehniques.  Notably,  Hue  et  al.  show  that  mea¬ 
surement  assignment  uneertainty  raises  the  posterior  lower  bound  on  estimation  error,  often 
substantially.  Analogous  results  are  obtained  in  this  report  with  regard  to  the  impaet  of  mea¬ 
surement  assignment  uneertainty  on  the  in-situ  assessment  of  estimation  error  given  by  the 
inverse  of  the  posterior  observed  information  matrix. 

The  work  most  relevant  to  the  present  diseussion  among  this  set  of  referenees  is  the 
report  by  Graham  and  Streit  [26],  whieh  diseusses  eomputation  of  the  CRLB  for  PMHT  and 
whieh  shows  that  the  Fisher  information  matrix  for  PMHT  is  equal  to  the  Fisher  informa¬ 
tion  matrix  derived  from  the  eomplete  data  likelihood  funetion,  minus  the  information  matrix 
assoeiated  with  the  missing  data  (measurement-to-target  assignments).  This  result  is  a  man¬ 
ifestation  of  the  “missing  information  prineiple”  to  be  diseussed  in  more  detail  later  in  this 
report.  Thus,  the  eomplete  data  lower  bound  obtained  by  inverting  the  eomplete  data  Fisher 
information  matrix,  whieh  Graham  and  Streit  show  to  be  bloek-diagonal,  is  a  lower  bound 
on  the  CRLB.  This  lower  bound  on  the  lower  bound,  they  argue,  is  analogous  to  the  trivial 
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lower  bound  of  Daum’s  approach.  The  present  report  is  not  concerned  with  the  computation 
of  the  Fisher  information  matrix  and  the  minimum  variance  bound,  which  are  independent 
of  observed  data,  but  rather  with  the  observed  information  matrix  and  its  inverse,  and  their 
assessment  of  estimation  accuracy  as  a  function  of  the  measurements.  Furthermore,  explicit 
closed-form  expressions  for  the  complete  data  and  missing  data  information  matrices,  in  terms 
of  the  maximum  likelihood  estimates  for  the  mixture  parameters,  are  derived. 

Finally,  a  recent  paper  by  Cai  et  al.  [27]  presents  an  EM  algorithm  for  tracking  maneu¬ 
vering  targets  in  clutter,  and  uses  the  SEM  algorithm  to  compute  the  error-covariance  matrix 
for  the  estimated  target  parameters.  Theirs  appears  to  be  the  first  use  of  the  SEM  algorithm  in 
the  tracking  literature.  In  fact,  their  algorithm  is  essentially  Avitzour’s  algorithm  with  posi¬ 
tion  and  amplitude  measurements.  Hence  their  paper  contains  the  first  accurate  computation 
in  the  literature  of  the  error-covariance  matrix  for  a  PMHT-related  algorithm.  In  contrast,  the 
error-covariance  matrix  for  PMHT  is  computed  in  this  report  using  the  analytical  approach  of 
Louis.  The  benefit  of  Louis’s  approach  in  this  case  is  that  it  gives  precise  statistical  meaning 
to  certain  quantities  fundamental  to  the  PMHT  computations  that  have  often  been  incorrectly 
interpreted  as  the  error-covariance  matrices  for  PMHT. 

1.2  REPORT  ORGANIZATION 

To  establish  terms  and  notation  used  throughout  this  report,  the  EM  method  for  maxi¬ 
mum  likelihood  estimation  for  incomplete  data  problems  is  introduced  in  section  2.  In  section 
3,  Louis’s  derivation  of  the  observed  information  matrix  for  the  general  case  of  incomplete 
data  is  summarized,  and  a  simplification  of  his  expressions  for  the  special  case  of  indepen¬ 
dent  observations  is  presented.  Also  in  this  section,  the  posterior  observed  information  matrix, 
which  is  used  to  compute  the  error-covariance  matrix  for  stochastic  dynamic  mixture  models, 
is  defined.  In  section  4,  maximum  likelihood  estimation  for  finite  mixture  models  using  the 
EM  method  is  reviewed,  and  general  expressions  for  the  corresponding  observed  information 
matrix  are  presented.  Expressions  for  the  maximum  likelihood  estimates  and  the  observed 
information  matrix  for  Gaussian  mixture  models  are  also  given  in  this  section. 

The  observed  information  matrix  and  the  posterior  observed  information  matrix  for  the 
deterministic  and  stochastic  dynamic  mixture  models,  respectively,  are  derived  in  section  5. 
In  both  cases,  the  maximization  step  at  the  final  EM  iteration  is  shown  to  be  equivalent  to  a 
standalone  estimation  problem  for  which  the  error-covariance  matrix  is  given  by  the  inverse 
of  the  complete  data  information  matrix  in  the  case  of  deterministic  motion,  and  the  inverse 
of  the  posterior  complete  data  information  matrix  in  the  case  of  stochastic  motion.  The  latter 
result  provides  a  precise  statistical  interpretation  of  the  error-covariance  matrices  obtained  as 
byproducts  of  PMHT  for  the  linear-Gaussian  case. 
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Section  6  discusses  the  suitability  of  the  inverse  of  the  observed  information  and  poste¬ 
rior  observed  information  matrices  as  estimates  of  the  error-covariance  matrices  for  the  deter¬ 
ministic  and  stochastic  dynamic  mixture  models,  respectively,  and  the  cost  of  computing  these 
inverses  when  the  number  of  sampling  times  is  large.  Section  7  includes  two  target  tracking 
examples  using  the  stochastic  dynamic  mixture  model,  one  of  two  crossing  targets,  and  one 
of  a  single  target  in  clutter.  The  consistency  of  the  target  parameter  estimates  is  examined  for 
each  example.  The  report  concludes  in  section  8  with  a  summary  of  findings,  a  discussion  of 
alternative  approaches  for  computing  the  observed  information  matrix,  in  particular  the  SEM 
algorithm,  and  a  discussion  of  topics  for  future  investigation. 
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2.  MAXIMUM  LIKELIHOOD  ESTIMATION  USING  THE  EM  METHOD 


2.1  GENERAL  CASE 


Consider  the  general  incomplete  data  problem  in  which  there  are  two  sample  spaces, 
the  complete  data  sample  space  X  and  the  incomplete  (or  observed)  data  sample  space  y, 
and  a  many-to-one  mapping  Y  :  X  ^  y.  Let  x  denote  an  arbitrary  point  in  X.  The  point 
X  is  not  observed  directly;  rather,  the  point  y  =  Y{x)  my  is  observed.  Assume  a  family 
of  density  functions  fx{x]  9)  on  X  indexed  by  the  parameter  vector  9  from  a  space  Let 
L  =  {x  :  Y (x)  =  y}  denote  the  section  of  X  determined  by  y.  The  corresponding  family  of 
observed  data  density  functions  /r  (r/;  9)  on  3^  is  given  by 

fY{y]9)  =  j  fx{x;9)dx,  (2-1) 

where  integration  here  is  meant  in  the  most  general  sense. 

For  a  fixed  sample  y,  the  density  function  friy]  9)  taken  as  a  function  of  the  parameter 
vector  9  is  the  incomplete  or  observed  data  likelihood  function.  Let  Ay  (r/;  9)  =  log  /y  (r/;  9) 
denote  the  observed  data  log-likelihood  (or  support)  function.  The  maximum  likelihood  es¬ 
timate  of  the  parameter  vector  9,  denoted  9,  is  that  value  of  9  in  the  space  O  that  maximizes 
fviy,  9)  or,  equivalently,  Ay(r/;  9)  for  the  given  sample  y,  that  is, 

9  =  argmax  XY{y;9).  (2-2) 

0 

Let  Xx{x',  9)  denote  the  complete  data  support  function  for  a  fixed  sample  x.  Using  the  EM 
method,  the  maximum  likelihood  estimate  of  9  is  obtained  by  solving  the  following  sequence 
of  complete  data  problems: 


9(k+i)  ^  [Ax (AT; 9)\X  eL]  (2-3) 

6 

for  /c  =  0, 1, . . . ,  where  9^^'>  is  an  initial  estimate  of  9,  and 


Ee,,,[Xx{X-9)\X  eL]=  j^Xx{x-,9)fxiY{x\y9^’^^)dx  (2-4) 

is  the  conditional  expectation  of  the  complete  data  support  function  at  the  kth  iteration.  As¬ 
suming  that  the  observed  data  density  functions  /y(r/;  9)  are  strictly  positive,  the  conditional 
density  functions  fx\Y{x\y,  9)  are  defined  by 


fx\Y{x\y,9) 


fx{x]9) 

fY{y;9) 


fx{x;9) 

Jl  fx{x;9)dx' 


(2-5) 


Expressions  (2-4)  and  (2-3)  are  the  expectation  step,  or  E-step,  and  the  maximization  step,  or 
M-step,  respectively,  of  the  EM  method.  The  sequence  9^^^^  of  EM  iterates  converges  to  the 
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maximum  likelihood  estimate  9  under  the  regularity  conditions  stated  in  Dempster  et  al.  [1] 
and  Wu  [28].  These  conditions  are  assumed  to  hold  here  and  throughout  the  report.  The  usual 
regularity  conditions  for  the  existence  of  maximum  likelihood  estimates  and  information  ma¬ 
trices,  and  the  interchangeability  in  order  of  differentiation  and  integration,  are  also  assumed 
in  the  sequel.  (See,  for  example,  Cramer  [29,  chapters  7,  32,  and  33]  and  Casella  and  Berger 
[30,  chapters  2  and  7]  for  statements  of  these  conditions.) 


2.2  INDEPENDENT  OBSERVATIONS 


Suppose  the  complete  data  x  consists  of  n  independent  (but  not  necessarily  identically 
distributed)  samples  xi, . . . ,  Xn-  These  samples  are  not  observed  directly;  rather,  the  samples 
yi  =  Yi{xi),. . .  ,yn  =  Yn{xn}  through  the  many-to-one  mappings  Yi  :  Vi  ^  3^i,  . . . , 
Yn  :  Xn  ^  yn-  Then,  L  =  Li  x  ■  ■  ■  x  Ln,  where  =  {x  :  Yi{x)  =  r/*}  is  the  section  of 
Xi  determined  by  yi.  Consequently,  the  complete  data  and  observed  data  likelihood  functions 
fx{x]9)  and  fY{y',9)  become  products,  and  the  corresponding  support  functions  Xxix-,9) 
and  Ay(r/;  9)  become  summations.  Substituting  these  results  into  (2-3)  and  interchanging  the 
order  of  integration  and  summation  gives 


n 


9{k+i)  ^  argmax  V]  Eg(k)[Xxi{^i;9)  \  Xi  e  Li] 

i=\ 

for  the  update  of  the  parameter  vector  9^9  ^  where 


(2-6) 


E,(.)[Ax,(X,;0)|X,  gL,]=  f  XxAx^■,9)fxM^^\y^■,9^'^^)dx,  (2-7) 

JLi 

is  the  conditional  expectation  of  the  support  function  for  the  complete  data  vector  Xi  at  the 
fcth  iteration,  and 


fxAxu9) 

Jl.  fxAxi]0)dxi 


is  the  conditional  density  function  of  Xi  given  the  observed  data  vector  y^. 


(2-8) 


2.3  MAXIMUM  A  POSTERIORI  ESTIMATION 

The  EM  method  can  also  be  used  to  find  the  maximum  a  posteriori  estimate  of  6*  in  a 
Bayesian  model  for  the  parameter  vector.  Let  0  denote  the  random  variable  associated  with  9, 
let  fe{9)  denote  its  prior  density  function,  and  let  Ae(6*)  =  log  fe{9)  denote  the  prior  support 
function.  The  posterior  observed  data  support  function  Aejy  (6'|r/)  =  log  fe\Y{9\y)  is  obtained 
using  Bayes’  formula: 


Xe\Y{9\y)  —  XY\e{y\9)  +  Xe{9)  —  XY^y)-  (2-9) 
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The  observed  data  support  funetion  Xyiii^O)  is  written  in  this  expression  as  Ay|©(|/|6*)  to 
emphasize  Bayesian  eonditioning  rather  than  parametrie  dependenee  on  0.  The  maximum  a 
posteriori  estimate  of  6,  denoted  6,  is  that  realization  of  the  random  variable  0  that  maximizes 
fe\Y{(^\y)  or,  equivalently,  Aejy  (^li/),  given  the  sample  y,  in  other  words, 

e  =  argmaxA©jy(6'||/)  =  argmax  {Ay|0(|/|6')  +  A©(6')}  .  (2-10) 

6  9 

As  diseussed  in  [1],  the  maximum  a  posteriori  estimate  of  6  is  obtained  using  the  EM  method 
by  solving  the  following  sequenee  of  eomplete  data  problems: 

=  argmax  {E.w [Ax|0(X|0)  |  X  G  L]  +  Ae(0)}  (2-11) 

for  A:  =  0, 1, ,  where  0*^°^  is  an  initial  estimate  of  9,  and  Ax|e(X|0)  is  the  eomplete  data 
support  funetion  eonditioned  on  9.  The  arguments  in  [1]  and  [28]  imply  that  eaeh  EM  iteration 
inereases  the  value  of  Xe\Yi9\y).  Also,  as  stated  in  [1],  when  fe{9)  is  a  natural  eonjugate  prior 
density  funetion  for  0,  the  funetion  to  be  maximized  in  (2-11)  often  has  the  same  form  as  that 
in  (2-3)  and,  so,  ean  be  maximized  in  the  same  way.  This  is  indeed  the  ease  for  the  Gaussian 
mixture  models  diseussed  later.  (Reeall  that  a  natural  eonjugate  prior  density  funetion  for  0 
has  the  property  that  the  posterior  density  funetion  is  a  member  of  the  same  family  of  density 
funetion.  See  Redner  et  al.  [31]  for  a  diseussion  of  a  natural  family  of  priors,  whieh  they 
refer  to  as  a  family  of  “elass  eonditional”  priors,  for  mixtures  of  density  funetions  of  the 
exponential  type.) 
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3.  OBSERVED  INFORMATION  FOR  INCOMPLETE  DATA  PROBLEMS 


Louis’s  approach  to  computing  the  observed  information  matrix  for  the  general  ease 
of  ineomplete  data  is  presented  in  this  seetion,  as  well  as  a  simplifieation  of  his  result  for 
the  speeial  ease  of  independent  observations.  The  posterior  observed  information  matrix, 
used  here  as  an  information  measure  for  stoehastie  dynamie  mixture  models,  is  also  defined. 
Throughout  this  seetion,  the  observation  y  is  assumed  given. 


3.1  GENERAL  CASE 


Before  deriving  the  observed  information  matrix  for  the  general  ease,  some  additional 
notation  and  useful  identities  are  presented.  (The  notation  adopted  here  follows  Louis’s,  with 
a  few  differenees.  His  derivation  is  found  in  the  appendix  of  [3].)  Let  Sx{x;  6)  and  Bx{x]  9) 
denote  the  first  derivatives  and  negative  seeond  derivatives  with  respeet  to  the  parameter  vee- 
tor  9  of  the  eomplete  data  support  funetion  Xxix;  9),  respeetively.  Likewise,  let  Syiv;  9)  and 
Byiv,  9)  denote  the  corresponding  derivatives  of  the  observed  data  support  funetion.  (The 
funetions  Sx{x]  9)  and  Sy{y]  9)  are  often  referred  to  as  the  eomplete  data  and  observed  data 
seore  funetions,  respeetively.)  These  definitions  lead  to  the  following  identities: 

Sxi.x-,e)  =  \'x(x-,0)  =  (3-1) 

fx{x-,9) 

-Bx(x-,0)  =  ^  ppp  -  Sx{x-,e)sl(x-,0).  0-2) 

Jx[X]  U) 

Taking  the  eonditional  expeetation  of  these  expressions  as  in  (2-4)  yields  the  identities 


'fxiX;9) 

fx{X-,9) 

fx{X-,9) 


X  e  L 
X  e  L 


Eg[Sx{X;9)\X  el],  (3-3) 

-Eg[Bx{X-,9)\X  eL] 

+Eg  [^^(X;  0)5l(X;  9)\XeL].  (3-4) 


The  information  matrix,  denoted  Iy{y]  9),  is  by  definition  equal  to  minus  the  seeond 
derivative  with  respeet  to  the  parameter  veetor  9  of  the  observed  data  support  funetion: 


Iy{y-9)  =  -X'f{y-9)=By{y-9).  (3-5) 

Evaluated  at  the  maximum  likelihood  estimate  0,  the  information  matrix  is  nailed  the  observed 
information  matrix  or,  sometimes,  the  observed  Fisher  information  matrix.  The  expected 
( Fisher)  information  matrix  is  defined  as  the  expeeted  value  of  the  information  matrix  iY{y;0) 
evaluated  at  the  “true”  value  of  the  parameter  veetor  9,  denoted  9*;  that  is, 

I{9*)=  [  Iy{y-9*)fy{y-9*)dy.  (3-6) 

Jy 
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(See  Edwards  [32]  for  statements  of  these  definitions.)  In  praetiee,  the  inverse  of  the  expeeted 
value  of  the  information  matrix  evaluated  at  9,  that  is,  I~^{9),  is  often  used  as  an  estimate  of 
the  error-eovarianee  matrix  for  9.  However,  Efron  and  Hinkley  [4]  give  several  justifieations 
for  preferring  the  inverse  of  the  observed  information  matrix,  namely,  ly'  {y;  9),  as  a  mea¬ 
sure  of  estimation  uneertainty,  at  least  for  the  sealar  parameter  ease.  As  noted  by  Eouis,  the 
observed  information  matrix  is  often  mueh  easier  to  eompute  than  the  expeeted  information 
matrix.  This  is  eertainly  the  ease  for  mixture  distributions. 

The  information  matrix  Jy  (y;  9)  as  given  by  (3-5)  is  often  diffieult  to  eompute  explieitly 
due  to  the  eomplexity  of  the  observed  data  support  funetion  Xyiy,  9).  Indeed,  it  is  for  this 
reason  that  the  EM  method  is  often  used  in  the  first  plaee.  The  goal  of  the  EM  method  is  to 
obtain  an  expression  for  the  maximum  likelihood  estimate  9  in  terms  of  the  simpler  eomplete 
data  support  funetion  Ax  (a;;  9).  Eikewise,  the  goal  here  is  to  obtain  an  expression  for 
in  terms  of  the  eomplete  data  support  funetion  and  its  derivatives. 

To  derive  the  information  matrix  Jy  (?/;  9)  in  terms  of  the  eomplete  data  statisties  Sx  {x]  9) 
and  Bx{x;  9)  requires  two  steps.  The  first  step  is  to  eompute  the  implieit  derivative  of  the  ob¬ 
served  data  support  funetion.  Erom  (2-1), 

SY{y;9)  =  XY{y]9)  =  j  f'^{x;9)dxj  j  fx{x;9)dx.  (3-7) 

Moving  the  denominator  inside  the  integral  in  the  numerator  and  multiplying  the  numerator 
and  denominator  of  the  integrand  by  fx{x;  9),  it  follows  from  (2-5)  and  (3-3)  that 

SY{y,9)  =  Eg[Sx{X-9)\X  e  L],  (3-8) 


where  the  eonditional  expeetation  is  defined  as  in  (2-4).  The  seeond  step  is  to  implieitly 
eompute  the  negative  seeond  derivative  of  the  observed  data  support  funetion.  Erom  (3-7), 

By{y9)  =  -X'^{y-9)  =  -  jj'^{x- 9)  dxj  j Jx{x-9)  dx  +  Sy{y-9)Sl{y9).  (3-9) 

Again,  moving  the  denominator  inside  the  integral  in  the  numerator  and  multiplying  the  nu¬ 
merator  and  denominator  of  the  integrand  by  fx{x;  9),  it  follows  from  (3-4)  and  (3-5)  that 


Iy{y  9)  =  Eg  [Bx{X-  9)\XeL\-Eg  [^x(X;  0)^I(X;  9)\XeL]+  Sy{y  9)Sl{y  9). 

(3-10) 

By  definition,  the  derivative  of  the  observed  data  support  funetion  is  zero  at  the  maximum 
likelihood  estimate,  that  is,  Sy{y;9)  =  0.  Thus,  the  observed  information  matrix  in  the 
general  ease  ean  be  expressed  entirely  in  terms  of  eonditional  expeetations  of  the  eomplete 
data  statisties  Sxix;  9)  and  Bx{x;  9): 


Iy{y9)  =  E^  Bx{X-9)\XeL 


-E, 


Sx{X-9)Sl{X-9)\XeR 


(3-11) 
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These  expectations  are  straightforward  to  compute  for  the  finite  and  dynamic  mixture  models 
considered  in  this  report,  and  they  result  in  intuitive  and  revealing  expressions  for  the  observed 
information  and  error-covariance  matrices  for  these  models. 

The  first  term  on  the  right-hand  side  of  (3-11)  represents  the  information  in  the  complete 
data;  the  second  term  is  a  correction  for  the  information  lost  to  the  missing  data.  To  see  this, 
recall  expression  (2-5)  for  the  conditional  density  fx\Yix\y;  0)  of  the  complete  data  random 
variable  X  given  y.  Let  Xx\Y{x\y]  9)  =  log  f x\Y{x\y]  9).  Taking  the  natural  logarithm  of 
(2-5)  and  rearranging  terms  gives 


Ay(|/;  9)  =  \x{,x]  9)  -  Xx\Y{x\y;  9). 


(3-12) 


Furthermore,  taking  the  conditional  expectation  as  in  (2-4)  of  minus  the  second  derivative  of 
this  expression  and  evaluating  the  result  at  the  estimate  9  gives 


lY{yJ)  =  E^  Bx{X;9)\XeL 


-E, 


-Xx\Y{X\y-9)\X  eL 


for  the  observed  information  matrix.  This  result  is  written  succinctly  as 

Iriy;  9)  =  Ix{x;  9)  -  Ix\Y{x\y]  9), 


(3-13) 


(3-14) 


where,  by  analogy  with  definition  (3-5),  the  matrix  Ix{x]9)  is  called  the  (conditional  ex¬ 
pected)  complete  data  observed  information  matrix,  and  Ix\Y{x\y]  9),  the  observed  informa¬ 
tion  matrix  associated  with  the  missing  data.  (For  brevity,  but  with  some  abuse  of  terminology, 
these  matrices  will  be  referred  to  as  the  complete  and  missing  information  matrices,  respec¬ 
tively.)  Clearly,  the  observed  information  decreases  as  the  information  lost  to  the  missing 
data  increases.  Consequently,  the  error-covariance  matrix  for  the  maximum  likelihood  esti¬ 
mate  9  (taken  here  to  be  the  inverse  of  the  observed  information  matrix)  increases  with  the 
information  lost  to  the  missing  data.  As  pointed  out  by  Louis,  the  factorization  (3-14)  is  an 
application  of  what  Orchard  and  Woodbury  [33]  call  the  “missing  information  principle”  to 
the  observed  information  matrix. 


3.2  INDEPENDENT  OBSERVATIONS 


In  this  case  of  independent  observations,  the  complete  data  and  observed  data  support 
functions  Ax,  Ay  and  their  first  and  second  derivatives  Sx,  Sy  and  —Bx,  —By  become 
summations,  and  expression  (3-11)  for  the  observed  information  matrix  becomes 


Jy(l/;  9)  =  Y,Ee  \BxAXi]  ^)  |  X,  G  E^  ISxAXi]  9)\X,e  R 


2  =  1  2=1 

n—1  n 

E  E^[SxAX^;9)\X,eR 

2=1 


E, 


Sl{Xy9)\X,eR, 


.  (3-15) 
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This  expression  from  Louis  [3]  simplifies  further  in  the  following  way.  Sinee  5'y  (?/;  6)  =  0, 
it  follows  from  (3-8)  that 


Y,EASxAX^;e)\X,eL, 


=  0. 


2=1 


Henee,  solving  for  the  fih  term, 


(3-16) 


SxAX^;e)\X,eL, 


.  E^[Sx,{X,;9)\X,eL, 


(3-17) 


By  straightforward  algebraie  manipulation,  it  is  easy  to  show  using  these  identities  that 


Sx{x-  e)sl{x-  e)\x  eR]=J2Es  \sxAx^■,  g)sUx,-  e)\x,e  l 


2=1 


^  El,  \SxAX^;  9)\x,e  r]  Ell  9)\X,e  R 


.  (3-18) 


2=1 


Thus,  the  observed  information  matrix  in  the  ease  of  independent  observations  beeomes 


My;  0)  =  J2^o  \BxAX^;  9)\X,e  R.]  E^  \SxAX^;  9)SlM^  0)  |  X,  e  L 


2=1 


2=1 


n 


E, 


SxM^\^)\X^eR  Eh  SlXXud)\X,eR^ 


(3-19) 


2=1 

This  simplified  expression  eliminates  the  double  sum  over  the  eross  terms  in  (3-15). 

Finally,  for  independent  and  identieally  distributed  data,  the  observed  information  ma¬ 
trix  Jy  (i/;  9)  ean  be  approximated  by  the  empirical  Fisher  information  matrix,  denoted  Riv;  9), 
and  defined  as  R{y;  =  ^Riy;  where 


-  1  1 

h(v,e)  =  Sy, (ft; (!,(;»)  --^SY(y,0)Sl(y,0)  (3-20) 

2=1 

is  the  empirieal  (sample)  eovarianee  matrix  of  the  seore  veetors  SYfyi;9).  (Reeall  that  if 
the  data  are  identieally  distributed,  the  seore  funetions  S'y.  are  all  the  same  funetion.)  Sinee 
S'y  (?/;  9)  =  0,  it  follows  that 


R{y;9)  =  Y,  SYAyi;9)SlM^9).  (3-21) 

2=1 

Using  relationship  (3-8),  the  empirieal  Fisher  information  matrix  ean  be  written  in  terms  of 
eonditional  expeetations  of  the  eomplete  data  seore  funetions: 


R{y;  9)  =  Y  EMxMp  9)\X,e  R]  0)  |  X,  G  R].  (3-22) 

2=1 
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The  appropriateness  of  this  approximation  to  the  observed  information  matrix  depends  on  the 
size  of  the  data  set,  as  diseussed  in  appendix  A.  See  Redner  and  Walker  [34]  and  Meilijson 
[35]  for  a  eomplete  diseussion  of  the  empirieal  Fisher  information  matrix  and  its  use  in  the 
EM  method. 

3.3  POSTERIOR  OBSERVED  INFORMATION 

By  analogy  with  the  definition  of  the  information  matrix  lyiv]  0),  the  posterior  infor¬ 
mation  matrix  /©|y(6'||/)  is  defined  as  minus  the  seeond  derivative  with  respeet  to  9  of  the 
posterior  observed  data  support  funetion  A©|y(6*||/).  From  (2-9), 

IeiY{e\y)  =  -X'^iy{e\y)  =  -A''|e(|/|0)  -  A"  (0),  (3-23) 

where  —X'f^Q{y\9)  is  the  information  matrix  Iy{y,9),  denoted  Jy|0(y|6*)  here  to  emphasize 
Bayesian  eonditioning  on  9,  and  —A© (6*)  is  the  prior  information  matrix,  denoted  Ie{9). 
Henee, 

Ie\y{9\y)  =  Iy\e{y\9)  +  Ie{9).  (3-24) 

When  evaluated  at  the  maximum  a  posteriori  estimate  9,  Ie\y{9\y)  and  /©(0)  are  ealled  the 
posterior  observed  information  matrix  and  the  prior  observed  information  matrix,  respee- 
tively.  Consequently,  the  posterior  observed  information  matrix  is  equal  to  the  observed 
information  matrix  plus  the  prior  observed  information  matrix.  Evaluating  (3-24)  at  9  and 
substituting  the  faetorization  (3-14)  for  the  observed  information  matrix  gives  the  following 
expression  for  the  posterior  observed  information  matrix: 

Ie\Y{9\y)  =  Ix\e{x\9)  -  Ix\Y,e{x\y,  9)  +  Iq{9),  (3-25) 

where  the  information  matriees  Ix\e{x\9)  and  Ix\Y,e{x\y,  9)  are  the  analogs  of  the  eomplete 
and  missing  information  matriees  Ix{x]9)  and  Ix\Y{x\y]9),  respeetively,  in  the  Bayesian 
model  for  9.  Thus,  the  posterior  observed  information  matrix  ean  be  written  entirely  in  terms 
of  complete  data  and  prior  statistics.  Again  for  brevity,  but  with  some  abuse  of  terminol¬ 
ogy,  the  combination  of  the  first  and  last  terms  in  (3-25)  will  be  referred  to  as  the  posterior 
complete  information  matrix,  denoted  /©|x(^|a:),  so  that 

Ie\Y{9\y)  =  Iq\x{9\x)  —  Ix\Y,e{.x\y,  9)]  (3-26) 

that  is,  the  posterior  observed  information  matrix  is  equal  to  the  posterior  complete  informa¬ 
tion  matrix  minus  the  information  lost  to  the  missing  data. 
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4.  OBSERVED  INFORMATION  FOR  FINITE  MIXTURE  MODELS 


Maximum  likelihood  estimation  and  eomputation  of  the  observed  information  matrix 
for  finite  mixture  models  (and  Gaussian  mixture  models,  in  partieular)  are  reviewed  in  this 
seetion.  Gaussian  mixture  models  are  the  basis  for  the  dynamie  mixture  models  eonsidered 
in  later  seetions. 

4.1  GENERAL  CASE 

Let  y  =  {yi  :  i  =  1, . . . ,  n}  be  a  given  set  of  n  independent  and  identieally  distributed 
p- variate  observations,  assumed  to  eome  from  a  mixture  of  m  distinet  sourees  in  unknown 
proportions.  The  goal  is  to  find  the  maximum  likelihood  estimates  of  the  parameters  for  eaeh 
souree  distribution  in  the  mixture,  and  the  proportion  eaeh  souree  eontributes  to  the  data. 
(In  general,  the  number  of  sourees  m  must  also  be  inferred  from  the  data,  but  that  problem 
is  not  addressed  here.)  Finding  these  estimates  would  be  straightforward  if  the  data  were 
labeled,  that  is,  if  eaeh  observation  eame  with  a  label  Zi  taking  a  value  in  the  set  m} 

indieating  the  souree  from  whieh  it  eame.  The  labels  z  =  {z*  :  i  =  1, . . . ,  n}  are  the  missing 
data  in  this  problem.  The  eomplete  data  are  then  x  =  {xi  :  i  =  1, . . . ,  n},  where  Xi  =  {yi,  Zi) 
is  the  eomplete  data  veetor  assoeiated  with  the  observed  data  veetor  y^. 

It  is  assumed  that  the  labels  2;  are  independent;  sinee  the  observed  data  y  are  assumed 
to  be  independent,  it  follows  that  the  eomplete  data  x  are  independent  as  well.  Moreover, 
sinee  the  seetion  Li  of  the  eomplete  data  sample  spaee  Xi  determined  by  the  observation  yi 
is  simply  the  set  {yi}  x  {!,...  ,m},  it  follows  that  the  integrals  over  Lj  deseribed  in  the 
preeeding  seetions  are  simply  sums  over  the  m  possible  sourees  of  yi.  In  partieular,  using  the 
identity 


(4-1) 


the  observed  data  likelihood  funetion  for  the  sample  yi  is  given  by  the  marginal 


(4-2) 


Thus,  fYi{yi]  6)  is  a  mixture  density  funetion,  where  fYi\Zi{yi\i]  G)  is  the  density  funetion  for 
the  sample  yi  given  that  it  eomes  from  souree  j,  and  fz^j]  Q)  is  the  a  priori  probability  of 
drawing  a  sample  from  this  souree. 

Given  an  estimate  9^^^  for  the  mixture  parameters  9,  the  updated  estimate  9^^^^^^  is 
obtained  by  evaluating  the  eonditional  expeetations  (2-7),  and  maximizing  the  sum  of  the 
results,  as  in  (2-6).  Combining  the  previous  two  results  with  the  identity 


fxi\Yi{xi\yi]9) 


^  fYiZi{yuZi]9) 

fYiiyn^)  fYi{yi]9) 


fzi\Yi{zi\yi]9),  (4-3) 
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the  conditional  expectations  (2-7)  for  the  finite  mixture  model  become 


0)\Xi  e  Li]  —  ^  9)  +  Xziij]  9)]  fziiYiUlUi,  9^^^),  (4-4) 


i=i 


where 


fY,\zAyi\r^^)fzAr^^) 

Ya=i  fYi\Zi{yi\l]9)fzi{l]9) 


(4-5) 


is  the  conditional  probability  that  the  sample  yi  comes  from  source  j. 

The  observed  information  matrix  for  the  finite  mixture  model  is  given  by  (3-19).  The 
conditional  expectations  in  (3-19)  are  computed  as  in  (4-4).  In  particular, 


Ei^[BxAXr]9)\X,eRi\ 


E^[SxAX^]9)\X,eRi\ 


m 

-Yl  fzi\YiU\yi',^),  (4-6) 

m 

Y  (4-7) 

i=i 


and 


E^[SxXX^■,9)Sl^{X,■9)\XieL,]  = 

m 

X]  -f  A^.(j;0)]  [\Yi\Zi{yi\j'-,9)  +  \zi{j--,9)\  fzi\Yi{j\yf,9).  (4-8) 

The  expectations  (4-6)  and  (4-7)  simplify  further  by  interchanging  the  order  of  summation 
and  differentiation: 

E^[BxAXi;9)\XieRi]  =  -  [Ax,(X,;  0)  |  G  L,]}" ,  (4-9) 

E^[SxXXi;9)\XieRi]  =  {Eg[XxXX^;9)  \  X,  e  L,]}' .  (4-10) 

Therefore,  once  the  expectation  (4-4)  required  to  compute  9  is  obtained,  it  need  only  be  dif¬ 
ferentiated  twice  to  obtain  two  out  of  the  three  expectations  required  to  compute  the  observed 
information  matrix  Jy  (?/;  9),  as  given  by  (3-19). 

4.2  GAUSSIAN  MIXTURES 

For  finite  Gaussian  (or  normal)  mixture  models,  the  conditional  observed  data  density 
function  /y.|Zi  {yi\zi]  9)  is  taken  to  be  the  multivariate  Gaussian  density  function  (/((i/jl/i^,, 
where 

=  (27,)p/2|^|i/2  -  ^yc-\a  -  &)|  (4-11) 

is  the  p-variate  Gaussian  density  function  with  mean  vector  b  and  covariance  matrix  C.  Addi¬ 
tionally,  the  prior  probability  fz^  {zf,  9)  is  taken  to  be  the  probability  tt^.,  where  {tti,  . . . , 
is  a  fixed  set  of  a  priori  probabilities. 
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4.2.1  Parameter  Estimation 


The  parameters  to  be  estimated  in  this  model  are,  in  general,  the  mean  veetors  /ij  ,  the 
eovarianee  matriees  Sj,  and  the  prior  probabilities  ttj  for  the  missing  measurement-to-souree 
labels.  However,  to  simplify  the  analysis  of  the  observed  information  matrix,  unequal  but 
known  values  of  the  eovarianee  matriees  are  assumed.  Henee,  the  parameter  veetor  9  for 
this  problem  eontains  the  mean  veetors  fij  and  the  mixing  proportions  vTj.  Consequently,  the 
eomplete  data  support  funetion  Xxi{xi]  9)  for  this  model  is 

=  -\{yi  -  -  f^zi)  +  logTT^,,  (4-12) 


where  the  first  and  seeond  terms  eorrespond  to  the  eonditional  observed  data  support  funetion 
^Yi\Zi  iyi\zi;  9)  and  the  prior  support  funetion  A^.  (z*;  9),  respeetively,  and  terms  not  dependent 
on  the  parameter  veetor  9  are  dropped  from  this  expression. 

It  is  important  to  emphasize  that  the  mixing  proportions  are  not  independent.  In 
partieular, 

m  m 

^  TTj  =  ^  0)  =  1,  TTj  >  0,  j  =  1, . . . ,  m,  (4-13) 

j=i  i=i 

and  one  must  be  eareful  to  aeeount  for  this  dependenee  when  estimating  the  mixing  propor¬ 
tions  and  eomputing  the  observed  information  matrix  /y(|/;  9).  In  the  sequel,  is  used  to 
denote  1  —  tti  —  •  •  •  —  and  the  full  expression  is  employed  when  taking  derivatives  of 
the  eomplete  data  support  funetion  to  ensure  proper  aeeounting  of  the  eonstraint  in  (4-13). 

The  update  equations  for  the  mixing  proportions  iij  and  mean  veetors  fij  are  obtained  by 
substituting  the  Gaussian  mixture  model  deseribed  above  into  expressions  (4-4)  and  (4-5),  and 
performing  the  maximization  in  (2-6)  subjeet  to  the  eonstraint  in  (4-13).  To  simplify  notation, 
let  Wji  denote  the  eonditional  probability  fzi\Yi  {j\yi',  0)  that  observation  i/i  eomes  from  souree 
j.  Then,  given  estimates  for  Hj  and  /ij  from  the  kth  iteration,  the  update  equations  are 


vr. 


(fc+i)  _ 


n 


E 

2  =  1 


W 


(k) 
ji  1 


and 


with 


nvr. 


{k+i) 


E 

2  =  1 


(k) 

wyyh 


T^fU{yi\yf\^j) 


(4-14) 


(4-15) 


(4-16) 
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4.2.2  Observed  Information  Matrix  Computation 


The  observed  information  matrix  for  this  ease  is  an  ((m  —  1)  +  pm)  x  ((m  —  1)  +  pm) 
bloek  matrix,  where  the  (m  —  1)  x  (m  —  1)  bloek  in  the  upper  left-hand  eorner  contains  the 
information  contribution  from  the  first  m  —  1  mixing  proportions,  and  the  pm  x  pm  block  in 
the  lower  right-hand  comer  contains  the  information  contribution  due  to  the  m  mean  vectors, 
each  of  length  p.  The  off-diagonal  blocks  pertain  to  information  in  the  various  mixing  propor¬ 
tion  and  mean  vector  combinations.  The  observed  information  matrix  computation  requires 
computation  of  the  expectations  (4-8)  through  (4-10).  Let  aj,l3i  denote  any  two  parame¬ 
ters  in  the  set  {tti,  . . . ,  vr^-i,  Pi,  ■  ■  ■ ,  Pm}-  To  simplify  notation,  the  following  shorthand  is 
introduced  for  expressions  (4-8)  through  (4-10): 


(4-18) 


(4-19) 


Substituting  the  complete  data  support  function  (4- 1 2)  for  the  Gaussian  mixture  into  the  above 
expressions  yields  the  following  results: 

a.  From  (4-19), 


{Si)^.  =  -  Pj),  j  =  1, . . . ,  m. 


(4-20) 

(4-21) 


b.  From  (4-18), 


0,  j  7^  I, 


(4-23) 


j  =  l...,m-l,  l  =  l,...,m. 


(4-24) 


c.  From  (4-17), 
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w 


ivi  -  -  yj)  ,  j  =  h 


0,  j  7^  I, 

^ivi  -  jj  =  ^,---,m-l,  j  =  1, 


j,  /  =  1, . . .  ,m,  (4-26) 


0,  j,l  =  l,...,m-l,  j 

j  =  1,. ..  ,m  -  1,  /  =  m. 


(4-27) 


The  expectations  required  in  (4-25)  through  (4-27)  are  obtained  using  the  following  results 
for  the  first  derivatives  of  the  complete  data  support  function: 


V^.Ax,((2/i,2;i);^) 


<  — ^  if  =  m, 
0  otherwise, 

\ 


0  otherwise. 


(4-28) 


(4-29) 


Finally,  let  denote  the  sub-block  of  the  observed  information  matrix  associated 
with  the  parameter  estimates  dj,  A-  Then,  from  (3-19),  using  the  above  shorthand, 

n  n  n 

=  E  -  E  +  E  {SJ )a .  (4-30) 

2=1  2=1  2=1 

Substituting  (4-20)  through  (4-27)  into  (4-30),  it  follows  that  the  terms  in  (4-22)  cancel  with 
the  terms  in  (4-25).  Also,  the  sum  of  the  terms  in  (4-27)  equals  zero  when  evaluated  at 
the  estimates  tcj  and  fij  as  given  by  (4-14)  and  (4-15).  These  results  lead  to  the  following 
simplifications  of  the  observed  information  matrix  for  Gaussian  mixtures: 


n 


=  E<4)»,(s7>»,. 

2=1 

j,  /  =  1, . . .  ,m  -  1, 

(4-31) 

n 

=  E<4)»,(s7>a,. 

2=1 

(4-32) 

n 

=  E<4)fe(S4A. 

2=1 

(4-33) 

Thus,  use  of  the  empirical  Fisher  information  matrix  as  an  approximation  to  the  observed 
information  matrix  is  somewhat  justified  in  this  case,  although  the  extra  calculations  in  (4-23) 
and  (4-26)  required  to  obtain  the  exact  observed  information  matrix  are  hardly  prohibitive. 
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5.  OBSERVED  INFORMATION  FOR  DYNAMIC  MIXTURE  MODELS 


As  discussed  in  the  introduetion,  dynamie  mixtures  are  useful  models  for  data  eolleeted 
over  time  originating  from  a  number  of  distinet  moving  sourees.  The  goal  is  to  estimate 
the  souree  trajeetories,  that  is,  to  “traek”  the  sourees  in  the  parameter  spaee  and  to  aeeurately 
eharaeterize  the  uneertainty  in  the  traek  estimates.  Intuitively,  the  uneertainty  in  the  estimated 
traeks  should  inerease  when  the  sourees  interfere  with  eaeh  other  in  the  observation  spaee, 
for  example,  when  two  or  more  trajeetories  eross  paths;  measurement-to-souree  assignment 
uneertainty  is  often  a  signifieant  souree  of  uneertainty  in  these  situations. 

In  the  following  seetions  two  important  dynamie  mixture  models  are  presented,  one 
in  whieh  the  sourees  follow  unknown  but  deterministie  trajeetories,  and  one  in  whieh  the 
trajeetories  themselves  are  subjeet  to  random  perturbations.  Gaussian  mixtures  are  used  to 
model  the  distribution  of  the  observations  at  eaeh  sampling  time  in  both  oases.  It  is  assumed 
that  at  eaeh  time  t  =  1, . . . ,  T,  a  set  of  independent  samples  yt  =  {yu},  i  =  1,  ■  ■  ■ , 
is  obtained  from  the  mixture.  Let  y  =  {yt}  denote  the  entire  oolleotion  of  samples,  and  let 
Xt  =  {xti}  and  x  =  {xt}  denote  the  oorresponding  sets  of  oomplete  data  samples.  It  is  also 
assumed  that  the  sets  xt  and  yt  are  independent  aoross  the  sampling  times.  Note,  however, 
that  sinoe  the  sourees  in  the  mixture  are  in  motion,  these  sets  are  not  identioally  distributed 
from  one  time  to  the  next. 


5.1  DETERMINISTIC  MOTION 


In  the  deterministie  ease  the  motion  model  for  eaeh  souree  is  embedded  in  the  observa¬ 
tion  matrix  of  the  standard  multivariate  linear  model.  In  partieular,  assuming  that  the  p  x  1 
measurement  veetor  y^  eomes  from  souree  j,  yu  is  related  to  the  g  x  1  veetor  of  kinematie 
parameters  yj  through  the  equation 


yti  T  Gt®’ 

where  Mjt  isapx  q  matrix  that  maps  pj  to  the  observation  spaee  at  time  t,  and  eju  is  a p  x  1 
Gaussian  random  veetor  with  zero  mean  and  eovarianee  matrix  Rjt.  The  random  errors  eju 
are  assumed  independent. 

For  example,  suppose  m  sourees  move  with  eonstant  veloeity  in  a  plane,  and  observa¬ 
tions  of  the  souree  positions  are  obtained  at  multiple  sampling  times.  The  trajeetory  of  souree 
j  is  eompletely  speeified  by  its  position  and  veloeity  at  an  arbitrary  referenee  time  f*.  Let  pj 
be  the  xp-position  and  xp- veloeity  of  souree  j  at  time  f*.  The  position  of  souree  j  at  any  time 
t  is  given  by  MjtPj,  where 


Mp  = 


0  At,u  0 

1  0  At,u 


(5-2) 


1 

0 
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and  At^tt  is  the  elapsed  time  between  t  and  t*. 

Given  the  linear  Gaussian  model  (5-1)  for  observation  yu  assuming  source  j,  the  con¬ 
ditional  observed  data  density  function  fYuiZuiVul^tu  is  the  multivariate  Gaussian  density 
function  (l){yu\Mjtyj,  Rjt).  As  before,  the  prior  probability  0)  of  observing  source 

j  at  time  t  is  taken  to  be  the  probability  vr^,,  where  {tti,  . . . ,  vr^}  is  a  fixed  set  of  a  priori 
probabilities  for  observing  data  from  each  source. 

5.1.1  Parameter  Estimation 

The  parameters  to  be  estimated  in  this  model  are,  in  general,  the  kinematic  parameter 
vectors  pj  and  the  measurement  covariance  matrices  Rj,  together  with  the  prior  probabilities 
71  j  for  the  missing  measurement-to-source  assignments.  However,  to  simplify  the  analysis  of 
the  observed  information  matrix  for  this  case,  unequal  but  known  values  of  the  covariance 
matrices  Rj  are  assumed.  Thus,  the  parameter  vector  9  contains  the  kinematic  vectors  pj  and 
the  mixing  proportions  nj.  Consequently,  the  complete  data  support  function  Xxiixi]  9)  for 
this  model  is 


Xxuiiyti,  zti);  9)  =  [-liyti  -  MjtPjY  Rj  ^{yu  -  MjtPj)  +  logTr^]  ,  (5-3) 


where  the  first  and  second  terms  in  brackets  correspond  to  the  conditional  support  function 
XYu\Zti{yti\ztii  9)  and  the  prior  support  function  Xzui^u]  0),  respectively,  and  terms  not  de¬ 
pendent  on  9  are  dropped  from  this  expression  for  clarity. 

The  update  equations  for  the  mixing  proportions  vr  and  the  kinematic  parameters  p  are 
obtained  by  substituting  the  linear  Gaussian  dynamic  mixture  model  described  above  into  the 
analogs  of  expressions  (4-4)  and  (4-5)  for  data  collected  over  more  than  one  sampling  time, 
and  maximizing  the  resulting  expressions  with  respect  to  vr  and  p  as  in  (2-6)  subject  to  the 
constraint  (4-13);  the  single  sum  over  the  measurement  index  i  in  (2-6)  is  replaced  by  the 
double  sum  over  the  time  and  measurement  indices  t  and  i,  respectively,  in  this  case.  Let  Wju 
denote  the  conditional  probability  fzuiYuijlyti',  9)  that  observation  yu  comes  from  source  j. 
Then,  given  estimates  for  vr^  and  pj  from  the  kth  EM  iteration,  the  update  equations  for  the 
conditional  probabilities  and  mixing  proportions  are 


T^fU{yu\Mjtpf\  Rjt) 


(5-4) 


Z7=i 


and 


(5-5) 


t=l  i=l 


where  n  =  Y^ 


T 

t=l 


rit  is  the  total  number  of  observations  over  all  sampling  times. 
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The  update  equations  for  the  kinematie  parameters  /ij  assume  intuitively  appealing 
forms  when  written  in  terms  of  the  “synthetie”  measurements* 


~{k) 

y)t 


^jti  yti' 
^nt  (k) 
Z^i=i 

and  synthetie  measurement  eovarianee  matriees 


(5-6) 


p(^) _ 


Rjt 


Ent 
i=l 


(5-7) 


w 


jti 


The  synthetie  measurement  jjjt  for  source  j  at  time  t  is  the  probabilistic  centroid  of  the  ob¬ 
servations  Uti  with  respect  to  the  assignment  probabilities  Wju.  The  synthetic  measurement 
covariance  matrix  Rjt  is  the  measurement  covariance  matrix  for  source  j  at  time  t  divided  by 
the  expected  number  of  measurements  from  this  source,  conditioned  on  the  observations  yu. 
To  see  this,  define  the  indicator  functions 


Zti)) 


1,  if  zu=j, 
0,  otherwise, 


(5-8) 


and  let  njt{xt)  =  Ym=i  be  the  number  of  measurements  that  come  from  source  j  at 

time  t.  Then, 


m  nt 


nt 


E[n,t{Xt)\Xt  G  Lt]  =  EE  ^  ^  Wjti-  (5  9) 

(=1  i=l  i=l 

is  the  expected  number  of  observations  yu  that  come  from  source  j.  Incidentally,  the  a  priori 
expected  number  of  observations  from  source  j  at  time  t  is 

m  nt 

E[njt{Xt)]  =  =  ntTTj-  (5-10) 

1=1  i=l 

Using  expressions  (5-6)  and  (5-7)  for  the  synthetic  measurements  and  synthetic  mea¬ 
surement  covariance  matrices  for  source  j,  the  update  equation  for  the  kinematic  parameter 
vector  pj  is 


E  aU'’  =  E 


T  r 


(5-11) 


,  t=l 


,  t=l 


This  set  of  linear  equations  to  be  solved  for  pj  is  the  set  of  normal  equations  for  pj  from 
linear  least-squares  theory.  It  follows  that  the  updated  estimate  for  pj  is  the  weighted  least- 
squares  estimate  for  pj  given  the  synthetic  measurements  yjt  with  weights  determined  by  the 
synthetic  measurement  covariance  matrices  Rju 

*The  term  “synthetic”  in  this  context  is  adopted  from  Streit  and  Luginbuhl  [8]. 
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5.1.2  Observed  Information  Matrix  Computation 

The  observed  information  matrix  for  the  linear  Gaussian  dynamic  mixture  model  is 
similar  to  that  for  the  standard  Gaussian  mixture  model  as  given  by  expressions  (4-20)  through 
(4-33),  but  with  the  subscripts  for  the  measurement  index  i  replaced  with  subscripts  for  the 
time  and  measurement  indices  t  and  i,  respectively,  and  the  single  sums  over  i  replaced  with 
double  sums  over  t  and  i.  In  particular,  the  observed  information  matrix  for  this  case  is  an 
((m  —  1)  -f  qm)  x  ((m  —  1)  -f  qm)  block  matrix,  where  the  (m  —  1)  x  (m  —  1)  block  in 
the  upper  left-hand  comer  contains  the  information  contribution  from  the  first  m  —  1  mixing 
proportions,  and  the  qm  x  qm  block  in  the  lower  right-hand  comer  contains  the  information 
contribution  due  to  the  m  kinematic  parameter  vectors,  each  of  length  q.  Substituting  the 
complete  data  support  function  (5-3)  into  the  analogs  of  expressions  (4-17)  through  (4-19) 
for  data  collected  over  multiple  sampling  times  gives  the  following  results: 

a.  From  the  time-dependent  form  of  (4-19), 


'^mi  j  1,  .  .  .  ,171  1, 

{Su)t,j  =  WjuMj^Rj^^{yti  -  j  =  l,...,m. 


(5-12) 

(5-13) 


b.  From  the  time-dependent  form  of  (4-18), 


j,l  =  l...,m-l,  (5-14) 


=  0,  j  =  l...,m-l,  l  =  l,...,m. 


(5-16) 


c.  From  the  time-dependent  form  of  (4-17), 


WjtiMj^Rj^^{yu  -  Mjtyj){yti  -  Rjt  Mjt,  j  =  I, 


ti  Rjltl 


j  7^  ^ 

j,l  =  1,. . .  ,m,  (5-18) 


0, 
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(SuSl) 


-  MjtyjY j,  /  =  1, . . . ,  m  -  1,  j  =  /, 

<  0,  j,l  =  l,...,m-l,  j  (5-19) 

-  MmtyraY Rmt^mU  j  =  1,  •  •  •  ,  “  1,  I  =  m. 


As  before,  let  jSi  denote  any  two  parameters  in  the  set  {tti,  . . . ,  /U-i, . . . ,  yWm},  and  let 

denote  the  sub-block  of  the  observed  information  matrix  associated  with  the  parameter 
estimates  ctj,  f3i.  Then,  from  the  time-dependent  form  of  (3-19),  using  the  above  shorthand, 

T  nt  T  nt  T  m 

4, A  +  E E  {S,I>A .  (5-20) 

t=l  i=l  t=l  i=l  t=l  i=l 

Substituting  (5-12)  through  (5-19)  into  (5-20),  it  follows  that  the  terms  in  (5-14)  cancel  with 
the  terms  in  (5-17),  and  that  the  sum  of  the  terms  in  (5-19)  equals  zero  when  evaluated  at 
the  estimates  fj  and  fij,  as  given  by  (5-5)  and  (5-11).  These  results  lead  to  the  following 
simplifications  of  the  observed  information  matrix  for  linear  Gaussian  dynamic  mixtures: 


T  nt 


4,t,  =  EE<4i>»44l>i,. 

t=l  i=l 

j,  1  =  1,- 

..,m-  1, 

(5-21) 

T  nt 

4, A  =  EE<4i)»,(S.I>A.. 

t=l  i=l 

j  =  1, •  •  ■ 

. ,  m  —  1,  1  =  1,.. 

. ,  m,  (5-22) 

T  nt 

=  EE<4i>A,(S,T)A. 

j,  1  =  1,- 

..,m,  j^l. 

(5-23) 

t=l  i=l 


Note,  however,  that  use  of  the  empirical  Fisher  information  matrix  as  an  approximation  to  the 
observed  information  matrix  is  not  strictly  speaking  justified  in  this  case,  as  the  observations 
yti  are  not  identically  distributed  across  sampling  times.  Indeed,  the  mixture  distribution  for 
the  observations  yu  in  general  changes  location  and  shape  from  one  time  to  the  next  due  to 
the  motion  of  the  sources. 


5.1.3  Interpretation  of  Complete  Information  Matrix 

Before  proceeding  to  stochastic  motion  models  for  dynamic  mixtures,  it  is  worth  ex¬ 
amining  the  statistical  interpretation  of  the  complete  information  matrix  for  the  deterministic 
case.  The  first  term  in  (5-20)  is  the  sub-block  of  the  complete  information  matrix  associated 
with  the  estimates  /3g  the  last  two  terms  represent  the  information  lost  to  the  missing  data. 
Let  denote  the  qm  x  qm  block  of  the  complete  information  matrix  associated  with  the 
collection  of  kinematic  vectors  fij,  let  denote  the  jth  diagonal  q  x  q  sub-block  of  this 

matrix,  and  let  {uj  :  j  =  1, . . . ,  m}  be  the  collection  of  unit  vectors  of  length  m,  where  the 
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jth  element  of  Uj  equals  one  and  all  other  elements  of  Uj  equal  zero.  Substituting  (5-15)  into 
the  first  term  of  (5-20)  and  using  the  synthetie  measurement  eovarianee  matriees  (5-7)  gives 

m  m  T 

[ix]fip.  =  ®  ^  (5-24) 

i=i  i=i  t=i 

This  result  may  be  interpreted  in  terms  of  the  M-step  at  the  final  EM  iteration  (that  is,  iteration 
k  =  00)  as  follows.  For  eaeh  j  G  {1, . . . ,  m},  eonsider  the  multivariate  linear  model 


Vjt  MjtHj  -\-  'yjt,  t  1, . . . ,  T, 


(5-25) 


where  jjjt  is  a  p  x  1  measurement  veetor,  Mjt  is  a  known  p  x  q  observation  matrix,  pj  is  a 
g  X  1  parameter  veetor  to  be  estimated,  and  jjt  are  p  x  1  independent,  normally  distributed 
noise  veetors  with  zero  means  and  known  eovarianee  matriees  Rjt.  Then, 

(T  \  ^ 


(5-26) 


is  the  minimum  varianee  unbiased  (MVU)  estimate  for  pj  with  error-eovarianee  matrix 


a 


jt 


(5-27) 


,t=i 


assuming  that  this  matrix  has  full  rank.  The  inverse  of  this  matrix  is  the  Fisher  information 
matrix  for  this  problem.  Comparing  (5-26)  with  (5-11),  it  follows  that  the  M-step  at  the  final 
EM  iteration  is  equivalent  to  the  MVU  estimation  problem  for  the  multivariate  linear  model 
(5-25)  with  independent  measurements  and  known  measurement  eovarianee  matriees 
Furthermore,  from  (5-27)  and  (5-24),  it  follows  that  the  eomplete  information  matrix 
for  pj  is  equivalent  to  the  Fisher  information  matrix  for  pj  for  this  MVU  estimation  problem; 
that  is. 

Thus,  the  observed  information  matrix  for  pj  ean  be  written  using  the  missing  information 
prineiple  as  in  (3-14)  as  follows: 


Vyhjkj  -  (5-29) 

where  *^he  information  lost  to  the  missing  data.  This  seemingly  obvious  eon- 

neetion  between  the  eomplete  information  matrix  obtained  at  the  final  EM  iteration  and  the 
Fisher  information  matrix  obtained  from  the  equivalent  MVU  estimation  problem  for  this  de- 
terministie  dynamie  mixture  model  leads  to  a  elearer  understanding  of  the  analogous  result 
for  the  stoehastie  model  diseussed  in  the  next  seetion. 


30 


5.2  STOCHASTIC  MOTION 


Suppose  the  trajeetory  of  eaeh  source  is  subject  to  random  perturbations  about  an  under¬ 
lying  deterministic  motion  model.  Then,  each  source  trajectory  may  be  treated  as  a  sequence 
of  random  variables  for  which  the  deterministic  motion  model  is  the  mean.  In  particular,  let 
/ijt  denote  the  kinematic  “state”  of  source  j  at  time  t.  Then,  it  is  assumed  that  the  states 
fij  =  {/ijt  :  f  =  0, 1, . . . ,  T}  are  related  by  the  first-order  Gauss-Markov  process 

T  (5-30) 

where  are  known  qxq  state  transition  matrices,  and  5jt  are  independent  g  x  1  Gaussian 

random  vectors  with  zero  means  and  known  covariance  matrices  Qjt.  Additionally,  the  state 
of  source  j  at  time  to  is  assumed  to  be  normally  distributed  with  mean  rjj  and  covariance 
matrix  Tj.  As  for  the  deterministic  case,  it  is  assumed  that  the  observations  yu  are  related 
linearly  to  the  source  states  jjjt  so  that,  assuming  that  observation  yu  comes  from  source  j, 

Vti  T  ^jtii  (5-31) 


where  again  Mjt  are  known  p  x  q  observation  matrices,  and  eju  are  independent  p  xl  Gaus¬ 
sian  random  vectors  with  zero  means  and  known  covariance  matrices  Rjt.  Furthermore,  it  is 
assumed  that  the  random  perturbations  Sjt  and  ejti  are  independent. 

Consider  again  the  constant-velocity  motion  example  presented  for  the  deterministic 
case.  As  before,  suppose  that  m  sources  move  (nominally)  with  constant  velocity  in  a  plane, 
and  that  observations  of  source  positions  are  obtained  at  multiple  sampling  times,  but  that  the 
kinematic  state  vectors  for  the  sources  are  subject  to  random  perturbations  between  sampling 
times.  Then,  to  within  these  perturbations,  the  position  and  velocity  of  source  j  at  time  t  can 
be  predicted  based  on  the  position  and  velocity  of  the  source  at  time  t  —  1  using  the  state 
transition  matrix  For  the  nominal  constant- velocity  model,  the  predicted  xg-position 

and  x?/- velocity  of  source  j  at  time  t  is  with  state  transition  matrix 


1  0  0 
0  1  0 
0  0  1  0 
0  0  0  1 


(5-32) 


The  position  of  source  j  at  time  t  is  simply  Mjtpjt,  with  observation  matrix 


M,*  = 


10  0  0 
0  10  0 


(5-33) 
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With  the  distributional  assumptions  on  the  souree  states  n  =  {nj  :  j  =  1, . . . ,  m}  in  the 
Gauss-Markov  process  model  (5-30),  and  assuming  a  diffuse  prior  for  the  mixing  proportions 
TT  =  {vTj  :  j  =  1, . . . ,  m},  the  joint  density  function  for  the  parameters  9  =  (vr,  /r)  is 

m  T 

fe{{7r,  /i))  cx  Yl  Tj)  JJ  Qj,t-i),  (5-34) 

j=i  t=i 

where  the  sources  are  assumed  to  be  independent,  and  the  states  for  each  source  are  condi¬ 
tionally  independent  from  one  sampling  time  to  the  next.  The  complete  data  and  observed 
data  density  functions  for  this  stochastic  dynamic  mixture  model  are  essentially  the  same  as 
for  the  deterministic  model,  except  for  the  dependence  of  the  state  vectors  fXjt  on  the  sampling 
times.  Specifically,  the  complete  data  density  function  is 

T  rit 

fx\e{{y,z)\e)  =  nn  [tTj  (j){yti\Mjtfijt,  ■  (5  35) 

t=l  i=l 

Marginalizing  over  the  missing  measurement-to-source  assignments  2;,  and  interchanging  the 
order  of  the  sums  and  products,  gives  the  observed  data  density  function 

T  nt  m 

fY\Q{y\9)  =  nnE  TTj  (l){yu\Mjtyjt,  Rjt)-  (5-36) 

t=i  i=i  j=i 

5.2.1  State  Estimation 

The  parameters  to  be  estimated  in  this  model  are  the  mixing  proportions  tt  and  the 
kinematic  state  vectors  y.  The  state  vectors  fij  =  {fijt  :  t  =  0, 1, . . . ,  T}  for  source  j  are 
treated  as  a  concatenated  state  (column)  vector  for  the  rest  of  this  discussion.  The  system 
matrices  {Qjt},  {Mjt},  and  {Rjt}  are  all  assumed  to  be  known. 

The  update  equations  for  the  maximum  a  posteriori  estimates  of  vr  and  p  are  obtained 
by  substituting  (5-34)  and  (5-35)  into  (2-11)  and  performing  the  necessary  expectations  and 
maximizations.  Let 

vh(0|0W)  =  Esi.,[Xxie{Xmx  eL]  +  Xe{9)  (5-37) 

denote  the  E-step  at  the  kth  EM  iteration.  Substituting  (5-34)  and  (5-35)  into  this  expression 
and  performing  the  expectation  gives 

m  r  T 

q/(0|0W)  =  E  \og(t){pjo\rij,T j)  +  ^  \og(t){pjt\Fjtj-iPjj-i,Qj^t-i) 
j=i  I  t=i 

T  rit  m  T  nt 

+  EE“iS  log  (l){yti\MjtPjt,  Rjt)  \  +  EEE  Wju  logTTj,  (5-38) 
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where  the  eonditional  probabilities  Wju  =  fzu\Yu,e{zti\yti,0)  are  obtained  from  the  ratio  of 
(5-35)  and  (5-36): 

_{k)  _ 

(k),f  W  D  (5-39) 

Ez=i  T^i  (t){yti\Muy>u,Rit) 

Sinee  the  prior  distribution  for  the  mixing  proportions  Hj  is  assumed  to  be  uninformative  (that 
is,  uniform),  the  update  equations  for  Hj  are  identieal  to  the  update  equations  (5-5)  for  the 
deterministie  ease. 

The  update  equations  for  the  kinematie  state  veetors  Hj  are  more  eomplieated  for  the 
stoehastie  case  because  of  the  time  dependence  between  states  due  to  the  Markov  model 
(5-30).  As  for  the  deterministic  case,  these  update  equations  assume  intuitively  appealing 
forms  when  the  E-step  is  written  in  terms  of  the  synthetic  measurements  (5-6)  and  synthetic 
measurement  covariance  matrices  (5-7).  After  some  tedious  algebraic  manipulation,  and  ig¬ 
noring  an  additive  constant  that  does  not  depend  on  vr  or  /i,  the  result  is 


m  ^  T 

^(0|0W)  = 

j=i  I  t=i 

T  m  T  nt 

+  Y  (Piyjt^\Mjtyjt, Rft)  (  +  5^ X]  (5-40) 

t=l  )  j=l  t=l  i=l 

Let  {e^  :  t  =  0,l,...,T}be  the  collection  of  unit  vectors  of  length  T  +  1,  where  the  fth 
element  of  e°  equals  one  and  all  other  elements  of  equal  zero,  and  let  for  all 

f,  r  =  0, 1, . . . ,  T.  Taking  the  derivative  of  (5-40)  with  respect  to  the  and  setting  the  result 
equal  to  zero  gives  the  following  q{T  -|-1)  x  q{T  -|-1)  system  of  equations  for  source  j: 


r(^) 


'  [data)j  +  hprior)j 


(fc+1)  _ 


where 


j(^) 

(data)j 


(data)j 


i{k)  .  7 

^{data)j  '  ^{prior)j 


t=l 

T 


Y  et  Mj,[RfY%\ 


(5-41) 

(5-42) 

(5-43) 


t=i 


and 


T-l 


^  {prior)  j 


E°m  0  ft!  +  e;,  ®  qjY  +  Y  ^‘1  ® 


t=l 


t=0 


d {prior)  j 


t=l 

Co  0 


Y  El,,  0  FY.QjY  -  E  ®  (5-44) 

(5-45) 


t=i 
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The  linear  system  of  equations  (5-41)  ean  be  effieiently  solved  for  using  a  speeialized 
form  of  Gaussian  elimination  for  bloek-tridiagonal  systems.  Alternatively,  as  shown  by  Streit 
and  Luginbuhl  in  [8],  the  system  ean  be  solved  efficiently  using  a  fixed  interval  Kalman 
smoothing  filter.  To  see  this,  observe  that  the  term  in  braces  in  expression  (5-40)  is  the  natural 
logarithm  of  the  joint  density  function  for  the  random  state  sequence  /ij  with  Gauss-Markov 
process  model  (5-30),  and  observations  jjjt  with  measurement  model 

Vjt  =  Mjtfijt  +  Ijt,  t  =  (5-46) 

where  'jjt  are  independent  p  x  1  normally  distributed  noise  vectors  with  zero  means  and 
known  covariance  matrices  Rjt.  The  joint  density  function  for  the  combined  model  (5-30) 
and  (5-46)  is  the  joint  density  function  for  the  fixed  interval  smoothing  problem  of  Kalman 
filtering  theory.  (A  useful  reference  on  Kalman  filtering  theory  for  this  work  is  the  book 
by  Mendel  [36].)  Hence,  the  state  estimates  jlj  obtained  from  the  M-step  at  the  final  EM 
iteration  {k  =  oo)  are  equivalent  to  the  minimum  mean-squared  error  (MMSE)  estimates  for 
Pj  obtained  from  the  fixed  interval  Kalman  smoothing  filter  given  the  synthetic  measurements 
and  synthetic  measurement  covariance  matrices  ■ 

The  linear  Gauss-Markov  dynamic  mixture  model  presented  in  this  section  is  precisely 
the  tracking  model  used  in  the  PMHT  method  of  Streit  and  Euginbuhl  [8].  In  their  report, 
the  authors  attempt  to  interpret  the  Eisher  information  matrix  for  the  states  pj  in  terms  of  the 
error-covariance  matrices  obtained  at  the  output  of  the  equivalent  Kalman  smoothing  filter  for 
the  M-step  at  the  final  EM  iteration.  Their  interpretation  is  not  theoretically,  by  their  own  ad¬ 
mission,  completely  satisfactory.  At  the  end  of  this  section,  an  exact  statistical  interpretation 
of  these  matrices  is  given  in  terms  of  the  complete  information  matrix  for  the  state  estimates 
Pj.  Specifically,  it  is  shown  that  the  error-covariance  matrices  from  the  Kalman  smoothing 
filter  for  pj  in  the  PMHT  model  are  the  diagonal  blocks  of  the  inverse  of  the  posterior  com¬ 
plete  information  matrix  Iq\x{0\x)  corresponding  to  pj.  Consequently,  these  error-covariance 
matrices  do  not  account  for  the  information  lost  to  the  missing  data  and,  thus,  are  overly  op¬ 
timistic  estimates  of  estimation  error. 

5.2.2  Posterior  Observed  Information  Matrix  Computation 

The  posterior  observed  information  matrix  /©|y(0||/)  for  the  linear  Gauss-Markov  dy¬ 
namic  mixture  is,  by  (3-24),  the  sum  of  the  observed  information  matrix  lY\e{y\0)  for  the 
linear  Gaussian  mixture  measurement  model  (5-36),  and  the  prior  observed  information  ma¬ 
trix  /©(0)  for  the  Gauss-Markov  process  model  (5-34).  The  observed  information  matrix 
lY\e{y\0)  is  similar  to  that  for  the  deterministic  case,  as  given  by  expressions  (5-12)  through 
(5-23),  except  that  the  sub-block  for  each  state  vector  pj  is  itself  a  block  matrix,  with  sub- 
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blocks  corresponding  to  the  kinematie  state  of  the  source  at  times  t  =  0, 1, . . . ,  T.  In  partie- 
ular,  the  observed  information  matrix  for  this  ease  is  an  ((m  —  1)  +  g(T  +  l)m)  x  ((m  — 
1)  +  q{T  +  l)m)  bloek  matrix,  where  the  (m  —  1)  x  (m  —  1)  bloek  in  the  upper  left-hand 
eomer  eontains  the  information  eontribution  from  the  first  m  —  1  mixing  proportions,  and 
the  g(T  -|-  l)m  x  q{T  +  l)m  bloek  in  the  lower  right-hand  eorner  eontains  the  information 
eontribution  from  the  m  eoneatenated  state  veetors  jjij,  each  of  length  q{T  -f  1).  Substitut¬ 
ing  the  eomplete  data  support  funetion  obtained  from  (5-35)  into  the  analogs  of  expressions 
(4-17)  through  (4-19)  for  data  eolleeted  over  multiple  sampling  times  gives  the  following 
eomputations  for  the  information  matrix  lY\e{y\0)- 

a.  From  the  time-dependent  form  of  (4-19), 


WjujHj  '^mi  j  1,  .  .  .  ^Tfl  1, 

0  WjuMj^R~\yu  -  j  =  1, . . . ,  m. 


(5-47) 

(5-48) 


b.  From  the  time-dependent  form  of  (4-18), 


(5-51) 


e.  From  the  time-dependent  form  of  (4-17), 


0, 


j  =  I, 
j  7^  I, 

=  (5-53) 


/ 


ti/'n-jlJ.i 


e 

0, 


_gO  (g)  ^{yu  -  MmtymtV RmtMmt,  j  =  I,  .  .  .  ,771  -  1,  I  =  771. 


(5-54) 
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Recall  that  the  prior  information  matrix  Ie{9)  is  the  negative  second  derivative  of  the  prior 
support  function  Xeid)  =  log  /e(0).  Hence,  from  (5-34), 


V,,  {V„  AeW}’'  = 

0, 

j,  /  =  1, . . .  ,m  -  1, 

(5-55) 

W,  {V„  AeW}""  = 

1  I  (prior)  jj  j 

(5-56) 

(0, 

j  7^ 

V,,  {V„  AeW}""  = 

0, 

(5-57) 

Let  aj,  f3i  denote  any  two  parameters  in  the  set  {tti,  . . . ,  vr^-i,  /io,  ■  ■  ■ ,  /^m},  and  let 
denote  the  sub-block  of  the  posterior  observed  information  matrix  associated  with  the  es¬ 
timates  aj,(3i.  Then,  from  (3-24)  and  the  time-dependent  form  of  (3-19),  using  the  above 
shorthand, 

T  nt  T  nt  T  m 

t=l  i=l  t=\  i=\  t=l  i=\ 

-  Va,  {v^,Ae(0)}^.  (5-58) 

Substituting  (5-47)  through  (5-57)  into  (5-58),  it  follows  that  the  terms  in  (5-49)  cancel  with 
the  terms  in  (5-52).  These  results  lead  to  the  following  simplifications  of  the  posterior  ob¬ 
served  information  matrix  for  linear  Gauss-Markov  dynamic  mixtures: 

T  nt 

=  (5-59) 

t=i  i=i 
T  nt 

=  j^l.  (5-60) 

t=i  i=i 

Again,  use  of  the  empirical  Fisher  information  matrix  as  an  approximation  to  the  observed 
information  matrix  is  not  appropriate  in  this  case,  since  the  observations  yu  are  not  identically 
distributed  across  sampling  times  due  to  source  motion. 

5.2.3  Interpretation  ofPMHT  Error-Covariance  Matrices 

Finally,  the  connection  between  the  error-covariance  matrices  for  the  state  estimates  fij 
obtained  from  the  equivalent  Kalman  smoothing  filters  for  the  M-step  at  the  final  EM  iteration 
for  the  linear  Gauss-Markov  mixture  model  and  the  posterior  complete  information  matrix  for 
these  estimates  needs  to  be  established.  The  first  and  last  terms  in  (5-58)  constitute  the  sub¬ 
block  of  the  posterior  complete  information  matrix  associated  with  the  estimates  dj  and  /3g 
the  middle  two  terms  in  this  expression  represent  the  information  lost  to  the  missing  data. 
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Let  [Ie\x]iifi  denote  the  q{T  +  l)m  x  q{T  +  l)m  bloek  of  the  complete  information  matrix 
associated  with  all  of  the  kinematic  state  vectors,  and  let  denote  the  jth  diagonal 

q{T  +  1)  X  q(T  +  1)  sub-block  of  this  matrix.  Substituting  (5-50)  and  (5-56)  into  the  first  and 
last  terms  of  (5-58)  and  using  the  synthetic  measurement  covariance  matrices  (5-7)  gives 

m  m 

^  ^  ®  ^  ^  ®  \,^{data)j  “f  ^(prior)j^-  (5-61) 

i=i  i=i 

This  result  may  be  interpreted  in  terms  of  the  equivalent  Kalman  smoothing  filters  for  the 
M-step  at  the  final  EM  iteration  (k  =  oo)  as  follows.  Consider  the  concatenated  form  of  the 
Kalman  smoothing  model.  In  particular,  let  {ct  :  t  =  1, ...  ,T}  be  the  collection  of  unit 
vectors  of  length  T,  where  the  fth  element  of  Cf  equals  one  and  all  other  elements  of  et  equal 
zero,  and  let  Etr  =  etej  for  all  f,  r  =  1, . . . ,  T.  Let  yj  =  ®  Vjt  be  the  concatenated 

synthetic  measurement  vector  for  source  j.  Then, 


Vj  =  Mjfij  +  7j, 


(5-62) 

(5-63) 


where 

Mj  =  0  X]t=i  ®  Mjt 

is  the  corresponding  concatenated  observation  matrix,  7^  is  a  normally  distributed  concate¬ 
nated  noise  vector  with  zero  mean  and  known  covariance  matrix  Rj  =  ®  Rjt,  and 

the  concatenated  state  vector  yj  is  normally  distributed  with  mean  vector 

T 

®  Vjt  (5-64) 


= 


t=o 


T  T 


and  covariance  matrix 

A  =  <5-«) 

t=0  r=0 

given  by  the  following  recursions  from  Theorem  15-5  in  Mendel  [36,  pp.#217-218]: 


Vjt  = 


Vj,  t  =  0, 

^jt,t—iVjj—\)  t  1, . . . ,  T, 


(5-66) 


and 


Pjtt  — 


PjtT  - 


r,, 

t  =  0, 

Pjt,t-iPj,t- 

+  Qjj-1, 

t  =  l,...,T, 

PjtrPjTTi 

t  >  T, 

..,T,  t^T, 

t,T  =  0,1,. 

PjttPjrti 

t  <  T, 

(5-67) 


(5-68) 
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where 


Fjtr  =  ■  ■  ■  Fj^r+1,T  for  t  >  T.  (5-69) 

From  Theorem  13-2  in  Mendel  [36,  p.#180],  the  MMSE  estimate  for  Hj  for  this  linear  Gaus¬ 
sian  model  is  given  by 

P‘{MMSE)j  =  Vj  +  PjMj {MjPjMj  +  Rj)  ^{jjj  —  MjVj),  (5-70) 

with  assoeiated  error-eovarianee  matrix 


=  (Pj-'  +  M]  (5-71) 

It  is  straightforward  to  show  that 

MjR-^M,=I^data)j.  (5-72) 

Furthermore,  it  is  shown  in  appendix  B  that 

Pj~^  =  I {prior)  j-  (5-73) 


Thus,  from  (5-61), 


^lj-(MMSE)j  i^j  F  [I{data)j  F  I{prior)j\  (5  74) 

that  is,  the  inverse  of  the  posterior  eomplete  information  matrix  for  the  estimate  fij  is  equal  to 
the  error-eovarianee  matrix  for  the  MMSE  estimate  for  /i^  given  the  synthetie  measurements 
and  synthetie  measurement  eovarianee  matriees  at  the  final  EM  iteration.  Moreover,  sinee  the 
fixed  interval  Kalman  smoothing  filter  is  just  an  effieient  algorithm  for  obtaining  the  MMSE 
estimates  (5-70)  and  the  diagonal  bloeks  of  the  error-eovarianee  matrix  (5-71),  it  follows  that 
the  error-eovarianee  matriees  for  the  smoothed  states  obtained  from  this  filter  are  the  diagonal 
bloeks  of  the  inverse  of  the  posterior  eomplete  information  matrix  for  the  estimate  jlj. 

The  posterior  observed  information  matrix  for  fij  ean  be  written  using  the  missing  in¬ 
formation  prineiple  as  in  (3-26).  Erom  (3-26)  and  (5-74), 


[Ie\Y]pjPi  -  Pp^^MSE)j  ~  (5-75) 

where  [Ix\Y,e\(ijiij  is  the  information  lost  to  the  missing  data.  Thus,  while  it  is  tempting 
to  interpret  the  error-eovarianee  matriees  from  the  Kalman  smoothing  filters  for  the  M-step 
of  the  final  EM  iteration  as  the  error-eovarianee  matriees  for  the  states  estimates  fij,  it  is 
elear  from  this  expression  that  these  matriees  provide  only  part  of  the  information  required 
to  eompute  error-eovarianee  matriees  for  fij.  In  short,  the  error-eovarianee  matriees  obtained 
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from  the  Kalman  smoothing  filters  do  not  aeeount  for  the  information  lost  to  the  missing 
measurement-to-source  assignments.  It  is  also  clear  from  (5-75)  that  the  posterior  observed 
information  matrix  for  fij  requires  computation  of  the/w/Z  MMSE  covariance  matrix  (5-71), 
and  not  just  the  diagonal  blocks  provided  by  the  Kalman  smoothing  filter. 

Furthermore,  in  general,  the  error-covariance  matrix  for  jlj  must  be  taken  from  the 
inverse  of  the  entire  posterior  observed  information  matrix  J©  y  for  all  estimated  source  states 
fi  and  their  mixing  proportions  vr,  because  in  general  Iq\y  is  not  block-diagonal;  that  is. 


-1 

jlj  jlj ' 


(5-76) 


Only  in  the  case  of  no  assignment  uncertainty  does  this  inequality  become  an  equality.  Indeed, 
from  expressions  (5-49)  through  (5-51),  (5-55)  through  (5-57),  and  (5-58),  it  follows  that  as 
the  information  in  the  missing  data  (the  contribution  from  the  second  and  third  terms  in  (5-58)) 
approaches  zero,  as  when  the  sources  move  farther  apart,  the  posterior  observed  information 
matrix  approaches  the  posterior  complete  information  matrix,  which  is  block-diagonal,  so 
that  from  (3-26),  (5-74),  and  (5-75), 


(5-77) 


r(MMSE)j  ■ 


Subsequently,  when  there  is  no  missing  data,  that  is,  when  the  measurement-to-source  assign¬ 
ments  are  known,  the  diagonal  blocks  of  the  inverse  of  the  posterior  observed  information 
matrix,  or  the  error-covariance  matrices  for  the  state  estimates  flj,  are  equal  to  the  error- 
covariance  matrices  obtained  from  the  Kalman  smoothing  filter,  which  is  expected  given  the 
assumptions  on  the  distributions  of  the  measurements  and  the  states.  Moreover,  the  inverse 
of  the  posterior  observed  information  matrix  for  the  estimates  jlj  is  equal  to  the  posterior 
Cramer-Rao  lower  bound  for  the  states  Hj  in  this  case. 


39(40  blank) 


6.  THEORETICAL  AND  PRACTICAL  CONSIDERATIONS 


At  least  two  issues  need  to  be  eonsidered  when  using  the  inverse  of  the  observed  infor¬ 
mation  matrix  as  an  estimate  of  the  error-eovarianee  matrix  for  dynamic  mixture  models:  the 
accuracy  of  the  normal  approximation  to  the  distribution  of  6,  and  the  cost  of  computing  the 
inverse.  These  issues  are  discussed  below. 

6.1  ASYMPTOTIC  NORMALITY  OF  6 

Recall  that  the  asymptotic  distribution  of  the  maximum  likelihood  estimate  9  is  normal 
with  mean  vector  9*  and  covariance  matrix  I~^{9*),  where  9*  is  the  “true”  value  of  the  param¬ 
eter  vector  9,  and  / (9*)  is  the  Fisher  information  matrix.  There  are  two  obvious  estimators  for 
the  asymptotic  error-covariance  matrix  namely,  I~^{9)  and  9).  In  [4],  Efron 

and  Hinkley  give  theory,  examples,  and  evidence  from  Fisher’s  original  writings  supporting  a 
preference  for  the  estimator  9)  over  I~^{9)  for  scalar  parameter  families.  The  simple 

example  at  the  beginning  of  their  paper  succinctly  illustrates  their  reasoning.  In  any  event, 
both  estimators  are  inferentially  valid  only  for  large  sample  size  n.  However,  with  regard  to 
the  scalar  parameter  examples  presented  in  their  paper,  Efron  and  Hinkley  note  that  “repeated 
sampling,  with  n  as  low  as  10,  seems  to  induce  normality  of  the  likelihood  rather  quickly.” 
On  the  other  hand,  McLachlan  and  Peel  [37]  state  that  “the  sample  size  n  has  to  be  very  large 
before  the  asymptotic  theory  applies  to  mixture  models.”  Determining  sufficient  sample  sizes 
for  appropriate  use  of  these  large  sample  approximations  to  the  error-covariance  matrix  for 
the  mixture  models  examined  in  this  report  requires  further  investigation. 

In  a  Bayesian  model  for  9,  the  distribution  of  the  maximum  a  posteriori  estimate  9 
depends  on  the  sample  size  n  and  the  nature  of  the  prior  distribution.  Asymptotically,  as 
77,  — >  oo,  the  distribution  of  9  approaches  the  distribution  of  the  maximum  likelihood  estimate 
for  9  discussed  above.  For  finite  sample  sizes,  the  distribution  of  9  depends  on  the  relative 
strengths  of  the  data  and  the  prior.  If  the  prior  is  relatively  weak,  the  distribution  of  9  will  be 
closer  to  that  of  the  maximum  likelihood  estimate.  On  the  other  hand,  in  the  absence  of  data, 
the  distribution  of  9  is  equivalent  to  the  prior  distribution  for  9. 

6.2  SEQUENTIAL  VERSUS  BATCH  PROCESSING 

Depending  on  the  number  of  sources  m  and  sampling  times  T,  the  posterior  observed 
information  matrix  for  stochastic  dynamic  mixture  models  can  be  costly  to  invert.  For  ex¬ 
ample,  the  number  of  parameters  to  be  estimated  in  the  linear  Gauss-Markov  mixture  model 
grows  roughly  linearly  with  m  and  T.  Specifically,  the  observed  information  matrix  for  this 
model  has  dimension  ((m  —  1)  +  q{T  -|-  l)m)  x  ((m  —  1)  +  q{T  -f  l)m),  where  q  is  the  length 
of  the  state  vector  for  each  source.  Suppose,  for  instance,  the  a;|/-positions  of  two  sources 
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(m  =  2)  moving  with  constant  velocity  (g  =  4)  are  observed  at  10  different  sampling  times 
(T  =  10).  The  posterior  observed  information  matrix  for  the  one  independent  mixing  propor¬ 
tion  and  both  sets  of  state  veetors  in  this  ease  has  dimension  89  x  89.  Computing  the  inverse 
of  this  matrix  requires  roughly  89^  ~  700  x  10^  operations.  There  are  perhaps  effieient  meth¬ 
ods  for  obtaining  this  inverse,  or  seleeted  portions  of  this  inverse  (for  example,  the  diagonal 
elements),  but  investigations  of  such  methods  are  beyond  the  seope  of  this  report. 

An  alternative  but  suboptimal  approaeh  for  eomputing  the  posterior  observed  infor¬ 
mation  matrix  for  stochastic  dynamic  mixture  models  is  to  reduee  the  size  of  the  matrix  by 
processing  the  data  sequentially.  In  this  approach,  the  data  eollected  at  each  sampling  time  are 
proeessed  as  if  they  were  the  only  data  eolleeted,  and  the  state  estimates  and  error-eovarianee 
matrices  eomputed  at  this  time  are  used  as  the  mean  veetors  and  eovarianee  matriees  of  the 
prior  distributions  for  the  states  at  the  next  sampling  time.  The  estimates  obtained  in  this  way 
are  suboptimal  in  that  they  are  conditioned  only  on  the  data  eollected  up  to  the  current  sam¬ 
pling  time,  and  not  the  entire  data  set.  In  the  language  of  Kalman  filtering  theory,  estimates 
obtained  by  proeessing  the  data  sequentially  are  ealled  filtered  estimates;  those  obtained  by 
eonditioning  on  the  entire  bateh  of  data  are  referred  to  as  smoothed  estimates.  The  reduetion 
in  the  number  of  computations  required  to  compute  error-covariance  matriees  in  this  subopti¬ 
mal  filtering  approach  can  be  substantial.  For  the  example  given  above,  the  posterior  observed 
information  matrix  for  the  one  independent  mixing  proportion  and  the  state  estimates  for  the 
two  sourees  at  eaeh  sampling  time  has  dimension  17  x  17.  Computing  the  inverses  of  these 
matriees  for  eaeh  of  the  sampling  times  requires  roughly  10  ■  17^  50  x  10^  operations,  a 

reduction  by  an  order  of  magnitude  over  the  optimal  smoothing  approach. 

While  the  error-eovarianee  matriees  for  the  state  estimates  are  eheaper  to  eompute  using 
the  filtering  approaeh  deseribed  above,  the  savings  eome  at  the  expense  of  aeeuraey  in  both 
the  state  estimates  and  the  error-covariance  matrices.  This  is  true  even  for  the  state  estimates 
and  error-eovarianee  matriees  obtained  at  the  final  sampling  time  T,  for  whieh  one  would 
think  smoothing  would  have  no  impaet.  When  there  is  no  measurement-to-souree  assignment 
uneertainty  (for  instance,  when  the  sourees  are  widely  separated),  the  filtered  estimates  and 
the  smoothed  estimates  of  the  souree  states  at  time  T,  and  the  assoeiated  error-covariance 
matrices,  are  identieal.  The  EM  iterations  for  this  ease  degenerate  to  a  single  iteration  that, 
in  terms  of  the  equivalent  Kalman  smoothers,  eorresponds  to  one  forward-baekward  pass 
over  the  synthetie  measurements  for  times  t  =  1, ...  ,T.  However,  when  there  is  signifieant 
interferenee  between  the  sources,  many  EM  iterations  may  be  required  for  the  state  estimates 
to  converge  to  their  final  values;  each  iteration  corresponds  to  a  forward-backward  pass  over 
the  synthetie  measurements,  whose  values  ehange  with  eaeh  pass  aeeording  to  the  updated 
eonditional  measurement-to-source  assignment  probabilities. 
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In  practice,  a  balance  between  the  filtering  and  smoothing  approaehes  ean  be  achieved 
by  implementing  the  algorithm  as  a  “sliding”  batch.  In  this  approach,  the  algorithm  is  first  run 
at  each  sampling  time  on  a  bateh  of  data  that  is  expanded  from  one  sampling  time  to  the  next 
until  it  reaehes  a  fixed  length.  When  this  length  is  reaehed,  the  batch  is  then  slid  forward  at 
eaeh  new  sampling  time,  so  that  the  data  at  the  current  sampling  time  are  added  to  the  bateh, 
and  the  data  from  the  oldest  sampling  time  in  the  bateh  are  removed  from  the  batch.  Let  p{t) 
denote  the  bateh  length  at  time  t,  and  let  p  denote  the  fixed  bateh  length.  Then,  the  batch 
length  at  time  t  is  given  by 


pit) 


t,  t  <  p, 

Pi  p  <t  <T. 


(6-1) 


Several  authors  have  proposed  similar  approaches  (see,  for  example,  Rago  et  al.  [38]  and 
Willett  et  al.  [39]),  and  most  have  noted  that  the  prior  distributions  for  the  states  in  each  batch 
must  be  determined  in  such  a  way  so  that  they  are  not  functions  of  data  in  the  current  bateh. 
The  prior  distributions  for  the  states  in  the  sliding  bateh  proposed  here  are  determined  as 
follows.  In  the  expanding  stage,  the  prior  mean  veetors  and  covariance  matriees  specified 
at  time  f  =  0  are  used  for  eaeh  batch.  In  the  sliding  stage,  the  state  estimates  and  error- 
eovarianee  matriees  from  the  bateh  at  time  t  —  p  are  used  in  the  prior  distributions  for  the 
bateh  at  time  t;  these  error-covarianee  matriees  are  eomputed  from  the  inverse  of  the  posterior 
observed  information  matrix  for  all  the  sources  for  the  batch  at  time  t  —  p.  This  approach  fixes 
a  problem  with  PMHT  not  identified  in  [39].  In  particular,  it  would  appear  that  the  sliding 
bateh  approaeh  proposed  in  [39]  uses  the  error-covarianee  matriees  obtained  from  the  inverse 
of  the  posterior  complete  information  matrix  as  priors  for  successive  batches;  it  was  shown  in 
the  previous  section  that  these  error-covarianee  matrices  are  too  small  when  there  is  significant 
measurement-to-source  assignment  uneertainty. 
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7.  EXAMPLES 


Two  target  tracking  examples  using  the  linear  Gauss-Markov  mixture  model  (that  is, 
the  PMHT  model)  are  presented  in  this  section.  The  first  example  is  of  two  constant-velocity 
crossing  targets.  This  example  is  idealized  in  the  sense  that  there  are  no  missed  detections 
(Fd  =  1)  and  no  false  alarms  {Pfa  =  0).  The  second  example  is  of  a  single  constant-velocity 
target  in  clutter  {Pfa  >  0).  This  example  is  further  complicated  by  the  possibility  of  missed 
detections  {Pd  <  1).  In  each  case,  the  consistency  of  the  target  state  estimates  is  examined. 
As  described  in  the  next  section,  consistency  in  this  context  is  a  measure  of  how  well  the 
estimated  error-covariance  matrices  reflect  the  actual  errors  in  the  state  estimates. 

7.1  ESTIMATOR  CONSISTENCY 

7.1.1  Parametric  Test 

Let  jlj{t\t)  denote  the  state  estimate  of  target  j  at  time  t  given  a  batch  of  measurements 
of  length  p{t)  >  1  with  leading  edge  at  time  t  and  trailing  edge  at  time  t  —  p{t) ,  and  let  Cj  {t\t) 
denote  the  corresponding  error-covariance  matrix.  When  there  is  no  assignment  uncertainty 
(for  instance,  when  the  target  measurements  are  labeled,  or  when  the  targets  are  widely  sep¬ 
arated)  and  under  the  linear  Gauss-Markov  model,  the  posterior  distribution  of  the  state  pjt 
given  the  batch  of  measurements  yt-p{t) ,  •  •  • ,  l/t  is  the  normal  distribution  with  mean  vector 
Pj{t\t)  and  covariance  matrix  Cj{t\t).  Let  Pj{t\t)  =  pjt  —  Pj{t\t)  denote  the  estimation  error. 
Under  these  assumptions,  it  follows  that 


E[pj{t)\yt_ppp...,yt\  =  0, 

cov{pj{t)\yt-p(t),---,yt)  =  Cj{t\t). 


(7-1) 

(7-2) 


A  state  estimator  is  said  to  be  consistent  if  the  estimation  errors  have  these  two  properties. 
Said  another  way,  a  state  estimator  is  consistent  if  the  estimation  errors  have  zero  mean,  and 
their  covariances  equal  the  estimated  covariances.  (See  Bar-Shalom  and  Li  [40]  for  a  full 
discussion  of  estimator  consistency.) 

Recall  from  [40]  that  the  normalized  estimation  error  squared  (NEES)  for  target  j  at 
time  t  is  defined  as 


(7-3) 


Given  the  modeling  assumptions  and  ideal  conditions  described  above,  the  NEES  nj{t)  is  chi- 
square  distributed  with  mean  (degrees  of  freedom)  q,  where  q  is  the  length  of  the  state  vector 
Pjt.  Suppose  the  tracking  simulation  is  run  N  times.  Then,  one  can  compute  the  average 
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NEES  for  each  target: 


(7-4) 


1=1 


where  is  the  NEES  for  target  j  at  time  t  for  the  /th  run.  By  the  properties  of  chi 


square  distributed  random  variables,  it  follows  that  N  times  the  average  NEES  is  chi-square 
distributed  with  Nq  degrees  of  freedom.  Hence,  to  test  for  estimator  consistency,  one  can  test 
the  simple  hypothesis 


Hq  :  Nujit)  is  chi-square  distributed  with  mean  Nq 


(7-5) 


at  each  time  t.  This  is  a  two-sided  test,  the  alternative  hypothesis  Hi  being  that  the  average 
NEES  Vj  (t)  has  mean  less  than  or  greater  than  Nq.  The  critical  region  for  this  test  for  a  fixed 
size  (level  of  significance)  a  is  typically  taken  to  be  the  lower  and  upper  tails  of  the  chi-square 
distribution,  each  with  probability  mass  a/ 2.  Let  denote  the  point  in  the  interval  [0,  oo) 
such  that  the  left-tail  probability  of  the  chi-square  distribution  with  degrees  of  freedom  ^  is 
r.  Then,  the  acceptance  region  (complement  of  the  critical  region)  for  this  two-sided  test  is 
the  interval  [xArg(<^/2),  XArq(l  “  Q:/2)].  Simply  put,  if  the  null  hypothesis  Hq  is  true,  then 
on  average  (1  —  a) %  of  the  average  NEES  values  Vj{t),  t  =  1, . . . ,  T,  will  fall  within  the 
acceptance  region. 

7.1.2  Nonparametric  Test 

The  test  for  estimator  consistency  based  on  the  average  NEES  value  Uj  (t)  is  standard 
in  the  tracking  literature  [40].  Eor  comparison,  an  alternative  test  based  on  the  sample  (or 
empirical)  distribution  function  of  the  NEES  values  . . . ,  is  proposed.  Eor 

detailed  discussions  of  tests  of  fit  based  on  the  empirical  distribution  function  (EDE),  see 
Cramer  [29,  section  30.8],  D’Agostino  and  Stephens  [41],  and  Stuart  et  al.  [42,  sections 
25.35-25.44].  Eor  the  remainder  of  this  discussion,  consider  an  arbitrary  but  fixed  target  j 
at  an  arbitrary  but  fixed  time  t.  Let  C(/)  denote  the  /th  order  statistic  of  the  NEES  values 
•  •  • ,  so  that  Qi)  <  ■  ■  ■  <  C(N)-  The  EDE  for  this  sample  is  defined  by 


0,  C  <  C(i), 

Fn  iC)  =  I/N,  C(0  <  C  <  C(Z+1), 
1,  C(iv)  <  C- 


(7-6) 


There  are  several  statistics  based  on  the  EDE  used  to  test  against  the  hypothesized  distribution 
of  the  sample.  The  most  well  known  is  the  Kolmogorov  (K)  statistic 


D^  =  sup|F^(C)-F(C)|, 


(7-7) 


C 


46 


where  F{()  is  the  true  (hypothesized)  distribution  function  for  the  sample  (in  this  case,  the 
chi-square  distribution  with  mean  q).  Less  well  known  are  those  from  the  Cramer-von  Mises 
family  of  statistics 

/OO 

{F^{C)-F{OY^l^{C)dF{C),  (7-8) 

•OO 

where  V'(C)  non-negative  weighting  functions.  Of  these,  the  two  most  studied  are  the 
statistic  corresponding  to  '0(C)  =  called  the  Cramer-von  Mises  (CVM)  statistic,  denoted 
Wn,  and  the  statistic  corresponding  to  0(C)  =  [F((C)(1  —  F{())]~^,  called  the  Anderson- 
Darling  (AD)  statistic,  denoted  Aj^.  Each  of  the  statistics  D^,  Wn,  and  An  is  a  distance 
measure  between  the  EDF  i0v(C)  and  the  hypothesized  distribution  function  F((),  and  each 
of  these  statistics  can  be  used  in  a  test-of-fit  with  simple  hypothesis 

Hf)  :  . . . ,  come  from  a  chi-square  distribution  with  mean  q.  (7-9) 


This  test  is  usually  cast  as  a  one-sided  (upper-tail)  test.  (See  [41,  section  4.5.1]  for  the  reason¬ 
ing  behind  this.)  Percentage  points  (critical  values)  for  these  statistics  for  various  significance 
levels  a  are  given  in  [41,  table  4.2,  p.#150]  and  in  [42,  p.#420];  the  null  hypothesis  Hq  is 
rejected  when  these  values  are  exceeded. 

Properties  of  these  non-parametric  statistics  are  discussed  at  length  in  [41].  In  summary, 
Dat  is  often  much  less  powerful  than  Wn  and  An,  meaning  that  tests  based  on  Dn  often  have 
a  lower  probability  of  accepting  the  alternative  hypothesis  Hi  when  Hi  is  true  than  tests 
based  on  Wn  and  An',  each  of  these  statistics  is  sensitive  to  deviation  from  the  mean  of 
the  hypothesized  distribution  F(C);  An  often  behaves  similarly  to  Wn,  but  is  usually  more 
powerful  when  the  EDF  Fn{C)  deviates  from  the  hypothesized  distribution  F{Q  in  the  tails. 

Computation  of  the  EDF  statistics  Dn,Wn,  and  An  is  typically  accomplished  using  the 
probability  integral  transformation  =  FiyJ^it)).  In  particular,  if  F  is  the  true  distribu¬ 
tion  function  for  the  random  variables  then  the  random  variables  v^\t)  are  uniformly 

distributed  between  0  and  1,  and  the  original  test-of-fit  becomes  a  test-of-fit  between  the  EDF 
for  the  transformed  variables  (t)  and  the  standard  uniform  distribution  function.  Eet  vq) 
denote  the  Ith  order  statistic  of  the  values  . . .  ,Vj^\t),  so  that  t;(i)  <  ■  ■  ■  <  V[n)- 

Then,  the  statistics  Dn,Wn,  and  A^r  are  given  by 


Dn 


max 


max 

i 


I 

N 


-vq) 


max 

i 


(7-10) 


Wn 


1 


+5: 


2/  -  1 
2N 


An  =  -N  -  [(2^  -  1)  logr;(z)  -f  {2N  +  1-21)  log(l  -  !;(/))]  . 

'  1=1 


(7-11) 

(7-12) 
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For  the  examples  presented  here,  the  values  Vj''^  (t)  are  obtained  by  evaluating  the  ehi-square 
distribution  funetion  for  q  degrees  of  freedom  at  the  NEES  values  (t)  for  eaeh  target  j  at 
each  time  t  for  simulations  /  =  1, . . . ,  N.  Proofs  related  to  the  probability  integral  transfor¬ 
mation  are  found  in  [41,  chapter  6]  and  Stuart  and  Ord  [43,  section  1.27]. 

7.2  TWO  CROSSING  TARGETS 

In  this  example,  two  constant-velocity  targets  cross  in  the  x|/-plane;  that  is,  the  two 
targets  share  the  same  xy-position  at  some  time  tc-  (Such  a  scenario  is  possible,  for  instance, 
when  two  aircraft  cross  paths  at  different  altitudes.)  At  time  f  =  0,  targets  1  and  2  are  at 
a:|/-positions  (1,0)  and  (2,0),  with  x|/-velocities  (0.05,2)  and  (—0.05,2),  respectively.  It 
follows  that  the  targets  cross  paths  at  time  tc  =  10.  A  single  measurement  of  each  target’s 
x|/-position  is  obtained  at  each  time  t  =  1, . . . ,  25,  for  a  total  of  50  observations  over  the 
entire  scenario.  These  measurements  have  a  standard  deviation  of  0.075  in  each  dimension. 
The  distance  between  the  two  targets  in  the  x-dimension  in  units  of  measurement  standard 
deviation  is  shown  in  figure  1.  Eor  consistency  with  the  assumption  that  a  single  measurement 
of  each  target  is  obtained  at  each  time,  the  mixing  proportions  tti  and  112  are  each  set  to  0.5 
and  held  fixed  for  the  simulation.  Einally,  the  mean  vector  for  the  prior  distribution  for  each 
target  is  taken  to  be  the  true  position  and  velocity  vector  of  each  target  at  time  f  =  0,  so 
that  7]i  =  (1,  0,  0.05,  2)  and  rj2  =  (2,  0,  —0.05,  2).  The  prior  covariance  and  process  noise 
covariance  matrices  for  each  target  are  taken  to  be 

r,=diag((l,  1,0.1,  0.1))  (7-13) 

and 

Q,i  =  10-Miag((l,  1,0.1,  0.1))  (7-14) 

for  j  =  1,  2,  and  t  =  0, . . . ,  24,  respectively,  where  diag(f )  is  the  diagonal  matrix  with  the 
elements  of  the  vector  v  on  the  diagonal. 

This  simulation  was  run  100  times  for  each  of  four  batch  lengths:  p  =  25,  10,  5,  and  1. 
Eor  each  run,  the  NEES  values  (7-3)  were  computed  twice,  once  using  the  posterior  observed 
information  matrix,  that  is,  with 

C-'m  =  |/e|KlA,(.|<).  (7-15) 

and  once  using  the  posterior  complete  information  matrix,  that  is,  with 

=  [Ie\x]fijit\t),  (7-16) 

where  [Ie\Y]fij(t\t)  and  [Ie\x]fij{t\t)  denote  the  sub-blocks  of  the  posterior  observed  and  pos¬ 
terior  complete  information  matrices,  respectively,  associated  with  the  state  estimate  jlj{t\t). 
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In  each  case,  the  average  NEES  values  (7-4)  over  the  100  runs  were  computed,  as  well  as  the 
K,  CVM,  and  AD  statistics  (7-10),  (7-11),  and  (7-12).  The  average  NEES  curves  for  the  four 
batch  lengths  are  plotted  in  figures  2  through  5.  The  K,  CVM,  and  AD  curves  are  plotted  in 
figures  6  through  9. 

In  figures  2  through  5,  the  horizontal  solid  line  indicates  the  mean  of  the  chi-square 
distribution  with  degrees  of  freedom  g  =  4;  the  area  between  the  horizontal  dashed  lines  indi¬ 
cates  the  95%  acceptance  region  for  the  null  hypothesis  (7-5).  It  is  clear  from  these  plots  that 
the  posterior  complete  information  matrix  J©|x  yields  inconsistent  estimates  of  estimation 
error  in  the  vicinity  of  the  crossing.  Indeed,  the  average  NEES  curves  associated  with  J©|x 
rise  well  above  the  acceptance  regions  near  the  crossing  regardless  of  batch  length,  indicating 
overly  optimistic  estimates  of  estimation  error;  said  another  way,  the  error-covariance  matri¬ 
ces  computed  using  the  posterior  complete  information  matrix  are  too  small  in  the  vicinity  of 
the  crossing,  where  the  measurement-to-source  assignment  uncertainty  is  large.  In  this  region, 
the  information  lost  to  the  missing  data  (measurement-to-source  assignments)  is  significant, 
and  the  missing  information  term  (the  second  term)  in  expression  (5-75)  for  the  posterior  ob¬ 
served  information  matrix  /©|y  is  nonzero.  Erom  (5-75),  it  follows  that  the  error-covariance 
matrix  computed  using  the  posterior  observed  information  matrix  J©  y  is  always  at  least  as 
large  as  the  error-covariance  matrix  computed  using  the  posterior  complete  information  ma¬ 
trix  /©|x.  Hence,  from  (7-3)  and  (7-4),  it  follows  that  the  average  NEES  curves  computed 
using  /©|y  in  figures  2  through  5  are  always  bounded  above  by  those  computed  using  Ie\x- 

It  is  also  clear  from  these  figures  that  estimator  consistency  deteriorates  with  smaller 
batch  length,  in  the  sense  that  more  average  NEES  values  fall  outside  of  the  acceptance  region 
as  batch  length  decreases.  This  result  in  summarized  in  table  1,  which  records  the  percentages 
of  the  average  NEES  values  for  both  targets  and  for  all  25  sample  times  that  fall  within  the 
95%  acceptance  region.  The  shaded  boxes  in  this  table  contain  the  percentages  associated 
with  the  average  NEES  values  computed  using  the  posterior  observed  information  matrix. 
The  containment  statistics  for  batch  lengths  of  25  and  10  indicate  that  the  inverse  of  the 
posterior  observed  information  matrix  gives  a  consistent  estimate  of  estimation  error  for  this 
example;  in  both  cases,  95.8%  of  the  average  NEES  values  fall  within  the  95%  containment 
region.  The  containment  statistics  drop  by  4.1  and  10.4  percentage  points  for  batch  lengths 
of  5  and  1,  respectively.  Interestingly,  the  containment  statistics  for  the  average  NEES  values 
computed  using  the  posterior  complete  information  matrix  increase  with  decreasing  batch 
length.  These  results  are  recorded  in  the  unshaded  boxes  in  table  1 .  The  reason  for  this  trend 
is  not  clear.  In  any  event,  these  containment  statistics  are  always  worse  than  the  corresponding 
statistics  computed  using  the  posterior  observed  information  matrix,  and  all  are  well  below 
the  expected  95%  containment  level. 
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Figures  6  through  9  show  plots  of  the  K,  CVM,  and  AD  statisties  as  funetions  of  sam¬ 
pling  time  for  eaeh  of  the  four  bateh  lengths.  Reeall  that  the  test  of  estimator  eonsisteney 
based  on  these  statisties  is  one-sided;  the  area  below  the  horizontal  dashed  line  in  these  plots 
is  the  95%  aeeeptanee  region  for  the  test.  All  of  the  results  diseussed  above  for  the  average 
NEES  eurves  hold  for  the  K,  CVM,  and  AD  eurves  shown  here,  with  one  exeeption:  the 
eurves  for  the  K,  CVM,  and  AD  statisties  eomputed  using  the  posterior  observed  information 
matrix  J©  y  are  not  neeessarily  bounded  above  by  those  eomputed  from  the  posterior  eom- 
plete  information  matrix  /©  x-  Nevertheless,  the  eontainment  statisties  in  table  1  indieate  that 
is  a  eonsistent  estimate  of  estimation  error,  while  Jqj^  is  not.  Of  the  three  statisties,  the 
AD  statistie  is  elosest  in  behavior  to  the  average  NEES. 


Figure  1.  Distance  Between  Targets  in  x-Dimension  in  Units  of  Measurement  Standard 

Deviation  for  Crossing  Targets  Example 
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(a)  Target  1.  (b)  Target  2. 

Figure  2.  Average  NEES  with  95%  Acceptance  Region  for  Crossing  Targets  Example  with 
Batch  Length  25,  Computed  Using  Posterior  Complete  Information  Matrix  (crosses)  and 
Posterior  Observed  Information  Matrix  (circles) 


(a)  Target  1.  (b)  Target  2. 

Figure  3.  Average  NEES  with  95%  Acceptance  Region  for  Crossing  Targets  Example  with 
Batch  Length  10,  Computed  Using  Posterior  Complete  Information  Matrix  (crosses)  and 
Posterior  Observed  Information  Matrix  (circles) 
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(a)  Target  1.  (b)  Target  2. 

Figure  4.  Average  NEES  with  95%  Acceptance  Region  for  Crossing  Targets  Example  with 
Batch  Length  5,  Computed  Using  Posterior  Complete  Information  Matrix  (crosses)  and 
Posterior  Observed  Information  Matrix  (circles) 


Average  NEES  for  Target  2  with  95%  acceptance  region 


(a)  Target  1. 


(b)  Target  2. 


Figure  5.  Average  NEES  with  95%  Acceptance  Region  for  Crossing  Targets  Example  with 
Batch  Length  I,  Computed  Using  Posterior  Complete  Information  Matrix  (crosses)  and 
Posterior  Observed  Information  Matrix  (circles) 
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Kolmogorov  Statistic  for  Target  1  with  95%  Acceptance  Region  Kolmogorov  Statistic  for  Target  2  with  95%  Acceptance  Region 


Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region  Cramer-von  Mises  Statistic  for  Target  2  with  95%  Acceptance  Region 


Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region  Anderson-Darling  Statistic  for  Target  2  with  95%  Acceptance  Region 


(a)  Target  1.  (b)  Target  2. 

Figure  6.  K,  CVM,  and  AD  Statistics  with  95%  Acceptance  Regions  for  Crossing  Targets 
Example  with  Batch  Length  25,  Computed  Using  Posterior  Complete  Information  Matrix 
(crosses)  and  Posterior  Observed  Information  Matrix  (circles) 
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Kolmogorov  Statistic  for  Target  1  Vifith  95%  Acceptance  Region  Kolmogorov  Statistic  for  Target  2  with  95%  Acceptance  Region 


Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region  Cramer-von  Mises  Statistic  for  Target  2  with  95%  Acceptance  Region 


Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region  Anderson-Darling  Statistic  for  Target  2  with  95%  Acceptance  Region 


(a)  Target  1.  (b)  Target  2. 

Figure  7.  K,  CVM,  and  AD  Statistics  with  95%  Acceptance  Regions  for  Crossing  Targets 
Example  with  Batch  Length  10,  Computed  Using  Posterior  Complete  Information  Matrix 
(crosses)  and  Posterior  Observed  Information  Matrix  (circles) 
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Kolmogorov  Statistic  for  Target  1  with  95%  Acceptance  Region 


Time 


Kolmogorov  Statistic  for  Target  2  with  95%  Acceptance  Region 


Time 


Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region 


Time 


Cramer-von  Mises  Statistic  for  Target  2  with  95%  Acceptance  Region 


Time 


Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region 


Time 


Anderson-Darling  Statistic  for  Target  2  with  95%  Acceptance  Region 


Time 


(a)  Target  1.  (b)  Target  2. 

Figure  8.  K,  CVM,  and  AD  Statistics  with  95%  Acceptance  Regions  for  Crossing  Targets 
Example  with  Batch  Length  5,  Computed  Using  Posterior  Complete  Information  Matrix 
(crosses)  and  Posterior  Observed  Information  Matrix  (circles) 
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Kolmogorov  Statistic  for  Target  1  with  95%  Acceptance  Region  Kolmogorov  Statistic  for  Target  2  with  95%  Acceptance  Region 


Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region 


Cramer-von  Mises  Statistic  for  Target  2  with  95%  Acceptance  Region 


Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region 


Time 


Anderson-Darling  Statistic  for  Target  2  with  95%  Acceptance  Region 


Time 


(a)  Target  1.  (b)  Target  2. 

Figure  9.  K,  CVM,  and  AD  Statistics  with  95%  Acceptance  Regions  for  Crossing  Targets 
Example  with  Batch  Length  1,  Computed  Using  Posterior  Complete  Information  Matrix 
(crosses)  and  Posterior  Observed  Information  Matrix  (circles) 
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Table  1.  Percentage  of  NEES,  K,  CVM,  and  AD  Values  That  Fall  Within  Their  Respective 
95%  Acceptance  Regions  for  the  Crossing  Targets  Example  (The  first  and  second  rows  for 
each  statistic  correspond  to  use  of  the  posterior  complete  and  posterior  observed 
information  matrices,  respectively,  to  compute  the  statistic.) 


Statistic 

25 

Batch 

10 

^ength 

5 

1 

NEES 

64.6 

70.8 

70.8 

77.1 

95.8 

95.8 

91.7 

85.4 

K 

68.8 

64.6 

66.7 

75.0 

9L7 

87.5 

85.4 

85.4 

CVM 

68.8 

70.8 

70.8 

79.2 

97.9 

91.7 

89.6 

89.6 

AD 

66.7 

68.8 

66.7 

75.0 

97.9 

95.8 

91.7 

89.6 
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7.3  SINGLE  TARGET  IN  CLUTTER 


In  this  example,  a  single  eonstant-veloeity  target  travels  in  the  xy-plane,  with  non-unity 
probability  of  detection  and  non-zero  probability  of  false  alarm  Pfa-  In  particular,  a  Pd 
of  0.9  is  assumed  fixed  and  known.  Furthermore,  it  is  assumed  that  at  each  sampling  time  t, 
nc{t)  uniformly  distributed  clutter  points  are  observed  in  a  square  coverage  region  centered  at 
the  true  position  of  the  target  and  with  sides  of  length  20r,  where  r  =  0.075  is  the  a;|/-position 
measurement  standard  deviation  in  each  dimension  from  the  previous  example.  The  number 
of  clutter  points  ndt)  is  assumed  to  be  Poisson  distributed  with  mean  XcV  =  4,  where  V 
is  the  volume  of  the  coverage  region  (in  this  case  1.5  x  1.5  =  2.25),  and  Ac  is  the  clutter 
density  in  this  region  (Ac  =  1.78  in  this  case).  Thus,  on  average,  four  uniformly  distributed 
clutter  points  are  expected  in  a  1.5  x  1.5  region  about  the  true  target  position;  the  probability 
of  observing  at  least  one  clutter  point  in  this  region  is  Prob{nc(f)  >  0}  =  0.98. 

At  time  t  =  0,  the  target  is  at  a:|/-position  (1,  0)  with  x|/-velocity  (0, 2).  At  each  time 
t  =  1, . . . ,  55,  at  most  one  measurement  of  target  a:|/-position,  and  nc{t)  clutter  points  (false 
measurements)  are  obtained,  each  distributed  as  described  above.  The  mean  vector  for  the 
prior  distribution  of  the  target  is  taken  to  be  the  true  position  and  velocity  vector  of  the  target 
at  time  t  =  0,  so  that  rji  =  (1,  0,  0,  2).  The  prior  covariance  matrix  for  the  target  is  taken  to  be 
1  X  10“^  times  the  matrix  (7-13),  and  the  process  noise  covariance  matrix  is  taken  to  be  the 
matrix  (7-14).  The  prior  distribution  for  the  target  state  at  time  to  is  made  more  informative 
(via  the  multiplicative  factor  1  x  10“^)  in  this  example  to  compensate  for  the  well-known 
difficulty  of  initializing  a  tracker  in  clutter.  There  are  various  other  ways  to  address  this 
problem,  but  they  are  outside  the  scope  of  this  report. 

The  PMHT  model  as  described  in  section  5.2  must  be  modified  to  account  for  false 
alarms;  that  is,  a  clutter  model  must  be  added  to  account  for  observations  that  do  not  originate 
from  a  target.  This  is  accomplished  as  described  in  Gauvrit  et  al.  [9]  by  adding  a  uniform  den¬ 
sity  function  to  the  mixture  density  function  for  each  observation.  The  impact  of  this  clutter 
model  on  the  update  equations  and  information  matrix  computations  for  the  linear  Gauss- 
Markov  mixture  is  primarily  confined  to  the  conditional  measurement-to-source  assignment 
probabilities.  Additionally,  some  of  the  information  matrix  expressions  (5-47)  through  (5-57) 
must  change  to  reflect  the  addition  of  the  clutter  source  to  the  measurement  mixture  model. 
These  changes  are  listed  in  appendix  C.  Finally,  for  consistency  with  the  assumption  that 
at  most  one  measurement  originates  from  the  target,  the  target  mixing  proportion  vti  is  set 
to  0.18,  according  to  expression  (C-12),  which  accounts  for  the  probability  of  detection  Pd 
and  expected  number  of  false  alarms  XcV .  The  clutter  mixing  proportion,  denoted  tt2,  is  then 
1  —  TTi  =  0.82,  and  is  held  fixed  for  the  simulation. 


58 


This  simulation  was  run  100  times  for  each  of  the  four  batch  lengths  p  =  25,  10,  5, 
and  1 .  As  for  the  crossing  targets  example,  average  NEES  values  and  the  K,  CVM,  and  AD 
statistics  were  computed  over  these  runs  for  these  batch  lengths  using  both  the  posterior  com¬ 
plete  information  matrix  and  the  posterior  observed  information  matrix.  The  average  NEES, 
K,  CVM,  and  AD  curves  are  plotted  in  figures  10  through  13.  It  is  clear  from  these  plots 
that,  again,  the  posterior  complete  information  matrix  Iq\x  yields  inconsistent  estimates  of 
estimation  error.  On  the  other  hand,  the  posterior  observed  information  matrix  Iq\y  gives 
consistent  estimates  in  this  example,  at  least  for  large  enough  batch  length.  These  results 
are  summarized  in  table  2,  which  records  the  percentages  of  average  NEES,  K,  CVM,  and 
AD  values  that  fall  within  their  respective  95%  acceptance  regions.  As  for  the  crossing  tar¬ 
gets  example,  the  values  in  the  unshaded  boxes  correspond  to  the  statistics  computed  using 
the  posterior  complete  information  matrix;  the  values  in  the  shaded  boxes  correspond  to  the 
statistics  computed  using  the  posterior  observed  information  matrix.  Eor  the  percentages  in 
this  table,  only  those  statistics  computed  after  sampling  time  f  =  5  were  counted,  since  each 
of  the  statistics  is  initially  skewed  by  the  combination  of  informative  prior  information  and 
good  initialization.  A  randomized  initialization  scheme  would  perhaps  have  eliminated  this 
trend,  but  the  present  scheme  was  deemed  sufficient  for  this  demonstration.  In  any  event, 
the  containment  statistics  for  batch  length  25  indicate  that  is  a  consistent  estimate  of  the 
error-covariance  matrix  for  this  simulation.  Again,  as  for  the  crossing  targets  example,  the 
AD  statistic  is  closest  in  behavior  to  the  average  NEES. 
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{a)  p  =  25.  {b)  p  =  10. 

Figure  10.  Average  NEES  with  95%  Acceptance  Region  for  Single  Target  in  Clutter 
Example  with  Batch  Lengths  25  and  10,  Computed  Using  Posterior  Complete  Information 
Matrix  (crosses)  and  Posterior  Observed  Information  Matrix  (circles) 


(a)  p  =  5.  (b)  p=  1. 

Figure  11.  Average  NEES  with  95%  Acceptance  Region  for  Single  Target  in  Clutter 
Example  with  Batch  Lengths  5  and  1,  Computed  Using  Posterior  Complete  Information 
Matrix  (crosses)  and  Posterior  Observed  Information  Matrix  (circles) 
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Kolmogorov  Statistic  for  Target  1  with  95%  Acceptance  Region  Kolmogorov  Statistic  for  Target  1  with  95%  Acceptance  Region 


Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region 


Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region 


Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region  Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region 


(a)  p  =  25. 


(b)  p=10. 


Figure  12.  K,  CVM,  and  AD  Statistics  with  95%  Acceptance  Region  for  Single  Target  in 
Clutter  Example  with  Batch  Lengths  25  and  10,  Computed  Using  Posterior  Complete 
Information  Matrix  (crosses)  and  Posterior  Observed  Information  Matrix  (circles) 
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Kolmogorov  Statistic  for  Target  1  with  95%  Acceptance  Region  Kolmogorov  Statistic  for  Target  1  with  95%  Acceptance  Region 


Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region  Cramer-von  Mises  Statistic  for  Target  1  with  95%  Acceptance  Region 


Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region  Anderson-Darling  Statistic  for  Target  1  with  95%  Acceptance  Region 


(a)  p  =  5.  (b)  p=  1. 

Figure  13.  K,  CVM,  and  AD  Statistics  with  95%  Acceptance  Region  for  Single  Target  in 
Clutter  Example  with  Batch  Lengths  5  and  1,  Computed  Using  Posterior  Complete 
Information  Matrix  (crosses)  and  Posterior  Observed  Information  Matrix  (circles) 
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Table  2.  Percentage  of  NEES,  K,  CVM,  and  AD  Values  That  Fall  Within  Their  Respective 
95%  Acceptance  Regions  for  the  Single  Target  in  Clutter  Example  (The  first  and  second 
rows  for  each  statistic  correspond  to  use  of  the  posterior  complete  and  posterior  observed 
information  matrices,  respectively,  to  compute  the  statistic.) 


Statistic 

25 

Batch  I 

10 

.ength 

5 

1 

NEES 

0.0 

0.0 

0.0 

0.0 

96.0 

84.0 

46.0 

0.0 

K 

0.0 

2.0 

6.0 

8.0 

98.0 

88.0 

74.0 

26.0 

CVM 

0.0 

2.0 

2.0 

6.0 

100.0 

92.0 

76.0 

16.0 

AD 

0.0 

0.0 

2.0 

4.0 

96.0 

84.0 

66.0 

6.0 

63(64  blank) 


8.  CONCLUSIONS 


8.1  SUMMARY  OF  FINDINGS 

An  analytical  approach  for  computing  the  observed  information  matrix  for  an  important 
class  of  mixture  models,  called  dynamic  mixtures,  is  developed  in  this  report.  Dynamic 
mixtures  are  useful  models  for  data  originating  from  a  number  of  distinct  moving  sources. 
Multiple  target  tracking  is  one  application  of  these  models;  PMHT  is  the  primary  example  of 
a  dynamic  mixture -based  approach  to  multiple  target  tracking.  In  the  basic  PMHT  model,  a 
Gaussian  mixture  is  used  to  describe  the  distribution  of  the  measurements  from  each  target, 
and  a  linear  Gauss-Markov  process  model  is  used  to  describe  the  target  dynamics. 

An  important  finding  of  this  report  is  the  precise  statistical  interpretation  of  the  error- 
covariance  matrices  for  the  PMHT  track  estimates  in  terms  of  the  observed  information  matrix 
computations  for  these  estimates.  In  particular,  it  is  shown  that  the  error-covariance  matri¬ 
ces  obtained  from  the  Kalman  smoothing  filter  for  each  target  state  sequence  at  the  final  EM 
iteration  are  the  diagonal  blocks  of  the  inverse  of  the  posterior  complete  information  matrix 
for  each  sequence.  Therefore,  these  error-covariance  matrices  provide  only  part  of  the  infor¬ 
mation  required  to  compute  error-covariance  matrices  for  the  state  estimates.  In  short,  the 
error-covariance  matrices  obtained  from  the  Kalman  smoothing  filters  do  not  account  for  the 
information  lost  to  the  missing  data,  that  is,  the  missing  measurement-to-target  assignments. 

Another  important  finding  of  this  report  is  the  impact  of  measurement-to-source  as¬ 
signment  uncertainty  on  estimator  consistency.  Specifically,  for  two  common  target  tracking 
scenarios  (two  crossing  targets,  and  a  single  target  in  clutter),  it  is  shown  that  the  posterior 
complete  information  matrix  yields  inconsistent  estimates  of  estimation  error  when  there  is 
significant  assignment  uncertainty,  while  the  posterior  observed  information  matrix  gives  con¬ 
sistent  estimates  (for  sufficient  batch  length).  In  each  scenario,  the  standard  chi-square  test  for 
the  distribution  of  the  average  NEES  is  used  to  test  for  estimator  consistency.  Additionally, 
new  tests  for  estimator  consistency  based  on  the  EDE  of  the  NEES  are  introduced;  these  tests 
are  shown  to  produce  results  comparable  to  those  of  the  standard  NEES  test. 

8.2  ALTERNATIVE  APPROACHES 

While  Louis’s  approach  for  computing  the  observed  information  matrix  when  using  the 
EM  method  can  be  applied  to  any  incomplete  data  problem,  there  are  almost  surely  problems 
for  which  the  required  expressions,  though  based  on  complete  data  statistics,  are  difficult  to 
derive  analytically  or  compute  numerically,  or  both.  Eor  these  problems,  the  supplemented 
EM  (SEM)  method  of  Meng  and  Rubin  [5]  is  an  attractive  alternative.  Their  method  gener¬ 
alizes  an  observation  made  by  Smith  [44]  in  his  discussion  of  Dempster  et  al.  [1].  Based  on 
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his  analysis  of  the  standard  errors  of  a  simple  example  in  their  paper,  Smith  alludes  to  the 
following  general  relationship  between  the  observed  data  error- varianee  Vo  and  the  eomplete 
data  error-varianee  Vc  of  the  maximum  likelihood  estimate  for  the  sealar  parameter  9: 


V 


O 


(8-1) 


where,  using  the  language  of  this  report,  Vo  is  the  inverse  of  the  observed  information  matrix 
(a  sealar  in  this  ease),  is  the  inverse  of  the  eomplete  information  matrix,  and  r  is  the 
rate  of  eonvergenee  of  the  EM  method  whieh,  for  large  values  of  the  iteration  index  k,  is 
approximated  by 


r  = 


0{k+l)  _  Q{k) 
Q{k)  _  Q{k-1)- 


(8-2) 


Thus,  the  observed  data  error-varianee  is  obtained  by  inflating  the  eomplete  data  error-varianee 
by  the  faetor  1/(1  —r).  Meng  and  Rubin  rewrite  (8-1)  in  the  statistieally  more  appealing  form 


Vo  =  Vc  +  Av, 


(8-3) 


where 

Av  =  zr^Vc  (8-4) 

1  —  r 

is  interpreted  as  the  inerease  in  error-varianee  due  to  the  missing  data.  Among  the  eontribu- 
tions  of  their  paper  are  the  analogous  matrix  version  of  (8-3),  and  eomputations  for  the  matrix 
versions  of  Vc  and  r.  Computation  of  the  rate-of-eonvergenee  matrix  r  involves  numerieal 
differentiation  of  the  implieit  mapping  Af  :  0  — >  (1  from  the  parameter  spaee  0  to  itself 
defined  by  the  EM  method  sueh  that 

^(fc+i)  _  yy^(0(fc))  foj.  A:  =  0,1,2,....  (8-5) 

However,  unlike  approaehes  sueh  as  Carlin’s  [45]  that  use  numerieal  differentiation  to  obtain 
the  error-eovarianee  matrix  direetly  from  the  observed  data  support  funetion,  SEM  uses  only 
numerieal  differentiation  to  obtain  the  inerease  due  to  the  missing  data  to  be  added  to  the 
eomplete  data  error-eovarianee  matrix.  Henee,  Meng  and  Rubin  elaim  that  SEM  is  typieally 
more  stable  beeause  the  eorreetion  obtained  by  numerieal  differentiation  is  added  to  the  eom¬ 
plete  data  error-eovarianee  matrix,  whieh  often  ean  be  obtained  analytieally  and  is  usually 
the  dominant  term.  Meng  and  Rubin  do  not  inelude  mixture  models  among  the  examples 
in  their  paper,  although  there  is  no  impediment  to  using  SEM  for  this  problem.  The  use  of 
SEM  for  dynamie  mixture  models,  and  a  eomparison  of  this  approaeh  with  Eouis’s  approaeh 
developed  for  these  models  in  this  report,  are  left  as  future  work. 
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8.3  FUTURE  INVESTIGATIONS 


Several  topies  for  future  investigation  have  already  been  proposed.  These  inelude: 

1 .  The  applieation  of  the  SEM  method  to  dynamic  mixture  models,  and  a  comparison 
of  this  method  for  computing  the  error-covariance  matrix  with  Louis’s  approach. 

2.  An  examination  of  the  accuracy  of  the  observed  information  matrix  for  dynamic 
mixtures  as  a  function  of  sample  size,  and  a  comparison  of  the  observed  information  matrix 
with  the  Fisher  information  matrix  for  these  models. 

3.  The  exploration  of  efficient  methods  for  computing  the  inverse  of  the  observed  infor¬ 
mation  matrix  for  dynamic  mixture  models,  including  subop timal  procedures  for  computing 
the  error-covariance  matrix  for  large  problems. 

Additionally,  there  are  at  least  two  more  topics  worth  pursuing. 

The  other  contribution  of  Louis’s  paper  [3]  is  a  method  for  accelerating  convergence  of 
the  EM  iterations  using  the  observed  information  and  complete  information  matrices.  Specif¬ 
ically,  Louis  shows  that  the  updated  estimate  for  6  at  the  k\h  EM  iteration  can  be  refined  in 
place  via  the  following  step: 

Q{k)  (g-b) 

The  refinement  6^1^^  is  an  improvement  over  the  update  in  the  sense  that  the  former  is 
closer  io  9  =  than  the  latter.*  Application  of  this  acceleration  method  to  the  dynamic 
mixture  models  presented  in  this  report  would  appear  to  be  straightforward. 

Finally,  it  is  proposed  that  the  observed  information  matrix  computations  developed 
here  be  extended  to  dynamic  mixture  models  for  grouped  and  truncated  data.  In  practice,  data 
are  often  grouped  into  a  finite  number  of  observation  cells  either  intentionally  (for  example, 
to  simplify  data  collection)  or  unintentionally,  perhaps  due  to  limitations  of  the  data  collec¬ 
tion  process.  Additionally,  if  the  number  of  observations  in  an  observations  cell  cannot  be 
reported  for  any  reason,  the  grouped  data  are  said  to  be  truncated.  In  any  event,  grouping  and 
truncating  samples  introduces  additional  missing  data  into  the  estimation  problem,  namely, 
the  sample  locations  within  the  observed  cells,  and  the  numbers  of  samples  and  their  locations 
in  the  truncated  cells.  Consequently,  the  EM  method  is  a  natural  approach  to  maximum  like¬ 
lihood  estimation  for  these  problems.  This  approach  is  treated  by  several  authors,  including 
Dempster  et  al.  [1]  and  McLachlan  and  Jones  [46].  The  latter  authors  explicitly  treat  finite 
mixture  models  for  grouped  and  truncated  data. 

‘There  is  a  transposition  error  between  the  matrices  Ix  and  in  this  expression  in  Louis’s  paper  [3,  expression  (5.3)].  The  error  is 
con'ected  by  Meilijson  in  [35,  expression  (11)]. 
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Recent  work  has  been  done  on  estimation  for  stochastic  dynamic  mixture  models  for 
grouped  and  truncated  data.  In  particular,  Luginbuhl  [47]  and  Luginbuhl  and  Willett  [48,49] 
apply  the  PMHT  model  to  a  histogram  representation  of  discrete  time  Fourier  transform  data 
to  estimate  the  parameters  of  general  frequency  modulated  signals  in  noise.  Of  particular 
interest  to  this  discussion  is  a  derivation  in  [47]  of  the  Fisher  information  matrix  for  the 
parameters  in  a  univariate  Gaussian  mixture  approximation  to  a  one-dimensional  histogram. 
This  result  is  important  to  this  work,  as  it  indicates  the  potential  existence  of  a  closed-form 
Cramer-Rao  lower  bound  on  estimation  error  for  the  stochastic  dynamic  mixture  model,  and 
the  PMHT  model  in  particular,  for  grouped  and  truncated  data.  Hence,  this  result  should 
provide  an  opportunity  to  compare,  in  the  spirit  of  Efron  and  Hinkley  [4],  the  relative  accuracy 
of  the  observed  information  matrix  versus  the  Fisher  information  matrix  for  dynamic  mixture 
models  and,  by  extension,  for  multiple  target  tracking. 
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APPENDIX  A 

APPROXIMATION  TO  THE  OBSERVED  INFORMATION  MATRIX 


Use  of  the  empirieal  Fisher  information  matrix  (3-21)  as  an  approximation  to  the  ob¬ 
served  information  matrix  for  independent  and  identieally  distributed  data  is  justified  in  the 
following  sense.  (The  argument  presented  here  is  a  detailed  version  of  the  argument  in 
MeLaehlan  and  Krishnan  [50]).  Consider  the  information  matrix  (3-5)  for  independent  and 
identieally  distributed  observations: 
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where  the  support  funetions  Ay.  in  this  ease  are  all  the  same  funetion.  Reealling  that  Ay.  (i/g  6)  = 
log  /y.  [yi]  9)  and  manipulating  the  derivatives  on  the  right-hand  side  of  (A-1)  yields 
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Now,  the  expeeted  value  of  /y(|/;  9)  evaluated  at  9*,  the  true  value  of  9,  is  the  Fisher  infor¬ 
mation  matrix  I (9*).  But  the  seeond  term  in  the  previous  expression  has  zero  expeetation. 
Indeed, 
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where  interchangeability  of  derivatives  and  integrals  has  been  assumed.  Therefore,  insofar  as 
9^9*  and  /y  (y;  9)  — >  I {9*)  as  n  — >  oo,  it  follows  that  for  large  sample  sizes 
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See  MeLaehlan  and  Krishnan  for  examples  of  this  approximation,  and  Redner  and  Walker 
[34]  and  Meilijson  [35]  for  uses  of  the  empirical  observed  information  matrix  to  accelerate 
convergence  of  the  EM  method. 
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APPENDIX  B 

INVERSE  OF  THE  GAUSS-MARKOV  PRIOR  COVARIANCE  MATRIX 


The  equality  of  the  matrix  I  {prior),  given  by  expression  (5-44),  and  the  inverse  of  the 
Gauss-Markov  prior  covariance  matrix  P,  given  by  expression  (5-65)  and  recursions  (5-67) 
and  (5-68),  is  established  by  the  following  theorem: 
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Then,  for  each  positive  integer  t,  the  matrix  equality  P~^  =  J  holds. 
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Proof.  For  a  given  positive  integer  t,  use  the  Gauss-Jordan  method  to  reduce  the  concate¬ 
nated  matrix  [P,  I]  to  the  matrix  [I,  P~^],  where  /  is  the  compatibly  sized  identity  matrix.  In 
particular,  let  denote  the  result  of  row  reduction  after  the  /th  step,  so  that  =  [P,  I] 
and  =  [J,  P~^]  for  some  L  >  0.  Moreover,  let  denote  the  /cth  row  of  Then,  the 
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reduction  of  the  rectangular  matrix  [P,  I]  to  the  matrix  [I,  P  terminates  in  three  steps,  as 
given  by  the  following  row  recursions: 
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The  recursions  and  produce  all  zeros  below  and  above  the  diagonals  of  the  left-half 
partitions  of  and  respectively;  finally,  the  recursions  produce  ones  along  the 
diagonal  of  the  left-half  partition  of  leaving  G^^^  =  [I,  P~^].  Inspection  of  the  right-half 
partition  of  this  matrix  reveals  the  identity  P~^  =  J. 
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APPENDIX  C 

ADDING  A  CLUTTER  MODEL  TO  PMHT 


The  changes  required  to  add  a  clutter  distribution  to  the  PMHT  model  discussed  in 
section  5.2  are  presented  in  this  appendix. 

Let  Dt  denote  the  sensor  coverage  region  at  sampling  time  t,  and  let  V (Dt)  denote  the 
volume  of  this  region.  In  the  example  of  section  7.3,  the  coverage  region  Dt  is  the  square 
centered  at  the  true  position  of  the  target  at  time  t  and  with  sides  of  length  20r,  where  r  is  the 
xy-position  measurement  standard  deviation,  so  that  V (Dt)  =  400r^.  Let  u{s;  G)  denote  the 
uniform  density  function  with  support  G,  so  that 


0, 


if  s  G  G, 
otherwise, 
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and  let  tt^+i  denote  the  mixing  proportion  associated  with  the  clutter  source.  Then,  with 
the  inclusion  of  this  clutter  model,  the  observed  data  likelihood  function  (5-36)  for  the  linear 
Gauss-Markov  dynamic  mixture  becomes 

T  nt  m+1 

fY\e{y\0)  =  nnE  (C-2) 

i=i  j=i  j=i 

where 

(t){yti\Mjtliju  Rjt),  j  =  1, . . . ,  m, 
u{yti;Dt),  j=m  +  l. 

Also,  the  constraint  (4-13)  must  be  expanded  to  include  the  mixing  proportion  vr^+i.  The  im¬ 
pact  of  this  clutter  model  on  the  update  equations  and  information  matrix  computations  for  the 
linear  Gauss-Markov  mixture  is  for  the  most  part  confined  to  the  conditional  measurement- 
to-source  probabilities  (5-39).  These  probabilities  become,  at  the  kth  EM  iteration. 
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for  j  =  1, . . .  ,m  -f  1.  Additionally,  some  of  the  information  matrix  computations  (5-47) 
through  (5-57)  must  change  to  reflect  the  addition  of  the  clutter  source  to  the  mixture  model 
for  the  measurements.  These  changes  are  as  follows:  expression  (5-47)  becomes 


expressions  (5-49)  and  (5-51)  become 
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and 
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respectively;  expressions  (5-52)  and  (5-54)  become,  respectively, 
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finally,  expressions  (5-55)  and  (5-57)  become. 
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To  be  consistent  with  the  standard  tracking  assumption  that  at  most  one  observation 
at  each  sampling  time  is  associated  with  each  target,  the  following  heuristic  is  used  to  set 
the  target  mixing  proportions  given  fixed  probability  of  detection  Po,  clutter  density  Ac,  and 
sensor  coverage  region  volume  V  (assumed  here  to  be  constant): 


(C-12) 


The  clutter  mixing  proportion  is  then  vr^+i  =  1  —  tti  —  ■  ■  •  —  vr^-  This  heuristic  is  slightly 
different  than  the  one  proposed  by  Rago  et  al.  [38]  and  Willett  et  al.  [39].  In  particular,  the 
denominator  in  their  expression  is  a  function  of  the  number  rit  of  observations  at  time  t.  In 
either  case,  experience  indicates  that  PMHT  algorithm  performance  is  relatively  insensitive 
to  precise  values  of  the  mixing  proportions,  and  that  rough  approximations  such  as  (C-12)  are 
usually  adequate. 
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